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Preface 


Wireless communication has become essential for everyday life all over the 
world, in almost every country. Irrespective of place or situation, people 
depend on wireless communication to fulfil their necessities. It is nearly 
impossible to remember a world before wireless communication became a 
critical entity in billions of lives. Rapid advancement in wireless communi- 
cations and related technologies has led to advances in this domain, which 
is the use of newer technologies like 6G, IoT, radar, etc. Not only are these 
technologies expanding, but the impact of wireless communication is also 
changing and becoming an inevitable part of our lives. 

With use comes responsibility with a lot of disadvantages for any newer 
technology. The growing risks in terms of security, authentications, user 
privacy, and encryptions are some major areas of concern. We have seen 
significant development in blockchain technology along with develop- 
ment in a wireless network that has proved extremely useful in solving 
many security issues. An efficient secure cyber-physical system can be con- 
structed using these technologies. This book covers all kinds of situations 
regarding the digital health and processes of intrusion detection in wireless 
networks. It allows the readers to reach their solutions using various pre- 
dictive algorithm-based approaches and some curated real-time protec- 
tive examples that are defined. The chapters also comprehensively state the 
challenges in privacy and security levels for various algorithms and various 
techniques and tools are proposed for each challenge. 

It focuses on exposing readers to advances in data security and privacy 
of wider domains. Security vulnerabilities are overcome using the tech- 
niques as proposed in the chapters. The book aims to address all viable 
solutions to the various problems faced in the newer techniques of wireless 
communications, improving the accuracies and reliability over the possi- 
ble vulnerabilities and security threats to wireless communications. This 
book is useful for the researchers, academicians, R&D organizations, and 
healthcare professionals working in the area of antenna, 5G/6G communi- 
cation, wireless communication, digital hospital, and intelligent medicine. 


xiii 


xiv PREFACE 
The key features of the book are: 


e Serves as a strong technological convergence solution for 
wireless communications in the cyber security domain 

¢ Enlightens the foundation of wireless communication net- 
works embedding with cyber-physical systems and founda- 
tional topics of blockchain 

e Exploring the practical issues in the automation domain 

¢ Highlights the AI powered analytics to analyse the charac- 
teristics of wireless user behaviour security models 

¢ Key insights about blockchain joining forces with wireless 
communication security to set up flawless cyber-physical 
systems 
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Dr. R. Maheswar 
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BBUCAF: A Biometric-Based User 
Clustering Authentication Framework 
in Wireless Sensor Network 


Rinesh, S.1*, Thamaraiselvi, K.?, Mahdi Ismael Omar! 
and Abdulfetah Abdulahi Ahmed! 


'Department of Computer Science, Jigjiga University, Jijiga, Ethiopia 
*Department of Computer Science, Malla Reddy College of Engineering, 
Hyderabad, Telangana, India 


Abstract 

Wireless Sensor Networks (WSN) have made much progress in the last few years, 
so data transmission must be more secure. Cryptographic keys keep information 
private, authenticate people, and keep data safe. Several research projects were 
done to interact with important management issues in WSNs. Prime statistics are 
used to make collective keys. It would then be able to accurately check the security 
of nodes. A new network way is modeled for sending data between nodes without 
restriction. A strong authentication system is needed to maintain network safety 
and allow people to use a network service freely. But the limited supplies of sensor 
nodes make it tough to authenticate people. To overcome the security-based issues, 
a biometric-based user clustering authentication framework (BBUCAF) has been 
introduced to increase the level of security and the network’s speed among the 
nodes. A biometric-based model is created by taking features from the fingerprint. 
Securely, feature vectors create a private key for the user. Such a key is sent to 
every sensor node. Then, private keys between sensor nodes are made by combin- 
ing a randomly generated count and the user’s key, which is sent to each sensor 
node. C- means Clustering is used to group nodes based on their range and unique 
identification. A collective key is made here using a fuzzy registration component 
that considers prime numbers. Fuzzy membership and biometric-based secret 
keys send data between groups and sensor nodes. Each cluster has group keys that 
differ from one cluster to the next. The network’s speed improves the network’s 
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effectiveness by cutting down on network traffic, protecting against DoS attacks, 
and extending the battery capacity of a node's battery with less energy usage. 


Keywords: Wireless sensor network, nodes, clustering, network traffic, 
authentication 


1.1 Introduction to Wireless Sensor Network 


Several sensors can be used together in a single WSN. Nodes in the net- 
work that sense their surroundings are known as sensor nodes [1]. A wide 
range of applications, such as structural health monitoring, environmental 
control, and combat observation, can benefit from such connections [2]. A 
node can perform computing, identify itself, and communicate with other 
devices [3]. Those nodes can be dispersed in a situation where they can 
identify each other and work together to accomplish the task in a large 
region [4, 5]. Sensor nodes in WSNs are used for specific tasks [6]. Small 
sensor nodes in the network model their surroundings’ information after 
spotting it [7]. Due to their wide range of applications, WSNs are becom- 
ing increasingly popular in education and the market [8]. WSNs are pri- 
marily designed to gather and send environmental information to a home 
or remote location via a network of sensing devices located in an isolated 
community [9]. The original data are then processed online or offline as 
per application standards for a full evaluation in a remote location [10]. 
If a patient is not in the hospital, for example, remote patient tracking is 
important for doctors. 

‘These systems can benefit from numerous applications, including struc- 
tural health monitoring, environmental control, and combat monitoring 
[11]. Most apps allow users to obtain data immediately from a gateway 
node because queries are handled on this node in most cases [12]. The 
information from a gateway node is very hard to receive on rare occasions. 
Therefore, sensor nodes collect information directly [13]. By sending the 
request to a sensor node, unauthorized users can quickly obtain sensitive 
information [14]. Asa result of sensor nodes’ inability to verify query mes- 
sages may leak sensitive data, and network resources, such as node power 
and bandwidth, could be wastefully depleted [15]. Any or all of the asso- 
ciated issues could impact the network's lifespan and effectiveness, mak- 
ing the system inaccessible to genuine people [16]. Since network data 
and resources can be illegally accessed, authentication is necessary [17]. 
To achieve this, sensor nodes must validate users identities [18]. All of 
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the following issues can be solved with user authentication, which enables 
authorized users to join a system [19]. As a result of the resource limits of 
WSNs'’ small sensor devices, namely their power and storage, along with 
their processing and transmission capabilities, providing authentication 
in these networks is a very difficult issue [20]. Even though several stan- 
dards have been presented, the authentication procedure is still vulnera- 
ble. In the end, a more robust and intelligent process is needed to assure 
the security of a WSN [21]. Maintaining a safe network requires a robust 
authentication system that allows users to access network services without 
restriction. Authentication is difficult due to the restricted supply of sensor 
nodes [22,23]. To overcome all the above-mentioned security-based issues, 
BBUCAF has been developed. The main contribution of BBUCAF is 


> To build a biometric model, enhance the network's security 
and performance using fingerprints’ unique characteristics. 

>» The user’s private key is generated securely using feature 
vectors. Every sensor node receives a key. Then, each sensor 
node receives a random count, and the user’s private key is 
combined with each sensor node. 

> Numerous benefits of a faster network include reducing net- 
work traffic, preventing denial-of-service (DDoS) assaults, 
and increasing node battery life. 


1.2 Background Study 


Many researchers have carried out research works. Tsu- Yang Wu et al. [24] 
developed Three-Factor Authentication Protocol (TAP), in which the log- 
ical study and informal analysis confirm safety, Burross-Abadii-Needham 
(BAN) logic, and ProVerif tools. The evaluation of security and perfor- 
mance reveals that the method offers stronger security and reduced com- 
putational burden. 

PP. Devi et al. [25] proposed SDN-Enabled Hybrid Clone Node 
Detection Mechanisms (SDN-HCN). An SDN-based methodology per- 
forms a network path evaluation and time-based research methodologies 
to identify and reduce duplicate nodes produced by cloning attacks. To 
identify clone nodes in a wireless network, one must use the HCN tech- 
nique. The simulation results reveal that several metrics are analyzed in the 
experiment. 
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M. Rakesh Kumar et al. [26] introduced a Secure Fuzzy Extractor-based 
Biometric Key Authentication (SFE-BKA) Scheme. The hash function is 
critical to the system's security. In SFE-BKA, the hash parameter value is 
irrespective of hash functions in an attempt to improve information secu- 
rity. The proposed method is not affected by this variance in hashing in 
terms of latency or delay. The outcome of SFE-BKA yielded 40% less data 
loss, 20% less energy usage, and less latency than earlier encoding systems. 

S. Ashraf et al. [27] developed a Depuration-based Efficient Coverage 
Mechanism (DECM). Two rounds of deployment are required to complete 
the process. When a node is to be moved to new locations, the Dissimilitude 
Enhancement Scheme (DES) is used to find it. The Depuration mechanism 
in the second cycle reduces the separation between prior and new places 
by controlling the needless migration of the sensor nodes. By analyzing 
the simulation findings and computing in 0.016 seconds, the DECM has 
attained more than 98% protection. 

Fan Wu et al. [28] described the Authentication Protocol for Wireless 
Sensor Networks (AP-WSN). Proverif’s formal verification shows that the 
new system retains its security features. AP-WSN is feasible and meets 
general demands in a way that counters various threats and meets secu- 
rity properties. The proposed approach outperforms previous schemes in 
terms of security and is suitable for use. The simulation findings indicate 
that the plan may be successfully implemented in an IoT system and have 
a practical use. 

Diksha Rangwani et al. [29] discussed improved privacy-preserving 
remote user authentication (PP-RUA). The suggested system is formally 
analyzed using the probabilistic Random-Oracle-Model to show the resil- 
ience of the scheme. Further, the system is simulated using a well-accepted 
AVISPA tool to show its security strength. The performance assessment of 
the system demonstrates that along with its consistency in aspects of pri- 
vacy, the suggested scheme is more effective in computing and networking 
overheads than other current schemes. 

SungJin Yu et al. [30] discussed Secure and Lightweight Three-Factor- 
Based User Authentication (SLUA). Secure, untraceable, and mutually 
authenticated communications are possible with the SLUA. Informal and 
formal methods are used to assess the safety of SLUA, along with the logic 
of Burrows—Abadi-Needham (BAN), the Real-or-Random (ROR) model, 
and the AVISPA simulation. SLUA-performance of WSNs is compared 
to other existing systems. Security and efficiency are more protected and 
more efficient in the proposed SLUA than in the prior suggested technique. 
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More problems are associated with security-based problems in sen- 
sor networks, and such security issues are concentrated on the proposed 
BBUCABF, and the obtained experimental analysis is compared with [25], 
[26], [27]. 


1.3. A Biometric-Based User Clustering 
Authentication Framework 


1.3.1 Biometric-Based Model 


The biometric-based model starts with the Registration Stage; a verified 
node serves as the initial registration point for new users. The biometric 
features of the users are then captured, and a hash is calculated based on 
the features. The entire architecture of BBUCAF is shown in Figure 1.1. 
From the biometric-based model, the features are extracted by transform, 
and based on the node identification, C-means clustering is implemented. 
The fuzzy membership function sends the data between the cluster and the 
sensor node. The effectiveness of the network is increased in terms of less 
traffic among the networks and protection against attacks. 

The verified network would then receive their identification and hash 
code, as shown below 


a=|identity,, y] 


y = g(biometrix.x) (1.1) 


Biometric 
based model = 


Extraction of 


Reduces network 


Figure 1.1 Architecture of BBUCAF. 
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The identification a and the hash code y are obtained from Equation 
(1.1), identity, denote the identity element, x, y represent the verification 
node, g denotes the registration phase. The verified node calculates r after 
valid registration and transmits it to the user, as indicated in Equation 
(1.2). Verified nodes use the registered value to obtain the necessary infor- 
mation from the network. 


a, =(r] 
es | emis (1.2) 
0 


The valid registration r is obtained from Equation (1.2), and the next 
stage of identification is represented as a, identity, denote the identity 
element, g denotes the registration phase, u, denote the initial stage. The 
different stages to getting the encrypted data are shown in Figure 1.2. 
The biometric identification phase captures the user’s biometric data, and 
the verification stage compares the collected data to the stored data. The 
registration phase compiles new data to be given in the verification stage, 
and all the data collected from the sensor node is given to the registration 
phase. The biometric data and the hash values are compared to overcome 
all the attacks. 

Biometric data is captured and hashed a second-time data is being sent 
to the sensor node together with identity, and the required information 


Biometric Verification 
Identification phase stage Registration Biometric 


IG — = G 


= 
@) @) 
A Hash 
Values 
Sensor 
Node 


Figure 1.2 Different stages to getting the encrypted data. 
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is given in Equation (1.3). y = g(biometric) represent a user’s most recent 
biometric information. 


y = g(biometric, RI, t,) (1.3) 


Here ¢, represent the user's actual period, RI represents the required 
information; initially, the sensor network examines the time stamp of a 
message it gets at the time t,. The query is refused if t, — t, < At, else the 
demand is routed to a known network at a period t, for user authentication 
using its own identity. At represent the estimated time; the sensor network 
is used to identify the sensor node capable of responding to user inquiries. 


a, =[identity,.,b,t,] 


b = glidentity,,(y’)] oo 


The responding stage for user queries a, with the sensor network to 
reduce network traffic are obtained from Equation (1.4), b represents the 
user authentication, identity, denote the identity element, t, represent the 
period of the known network. y’ represent a user’s most recent biometric 
information. The verified node checks y and y’, then the trustworthy node 
delivers a decline signal to the sensor node. a, = [decline], the signal is sent 
to the client by the sensor network. a, = [decline], the transmitter sends the 
data to a,. a, = [going on] After receiving a notification with the label for 
going on the process, the customer can begin the verification procedure. 

The authentication stage and secret key generation for each cluster 
head are shown in Figure 1.3. The user authentication stage involves the 
extraction of features from the biometric sign, and the secret key genera- 
tion is used for different sensor nodes m, and m, in the cluster group. Each 
network device has a clear text and an encrypted random counterpart as 
part of the configuration process. 

After the authentication step, the feature patterns are taken from the 
fingerprint. The feature patterns are taken from the user where the secret 
keys are created. Each sensor node receives a copy of such a key using 
a pseudorandom character generator. From one cluster to the next, there 
are unique cluster group keys. As each node has access to both private 
and public data, it is possible for adjacent nodes to silently share a key. 
Authentication keys are used in the next phase to ensure that each pair of 
nearby nodes has a unique key before beginning a secure connection via 
an authorized connection. As a result of this strategy, nodes m, and m, are 
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Figure 1.3 The authentication stage and secret key generation for each cluster head. 


protected from each other by securely distributing comparable values. b , A 
pairwise biometric key uses pseudorandomness and the biometric feature 
vector to provide a secure result. Each network device has a clear text and 
an encrypted random counterpart (q,, D_) as part of the preconfiguration 
process. In addition, a small number of randomly selected prime numbers 
are divided into groups, with each junction of two different groups con- 
taining a single positive number. 

The goal of such groups is to divide up the work of computing keys 
between each pair of adjacent nodes. Every node, m, and m, selects one 
cluster at random, and the intersection significance of the two groups is 
taken to be an overall prime number selected by m, and m,. A biomet- 
ric cost and a pseudorandom feature are used to make b _,. These are then 
thought to be hidden data and must be sent safely. After extraction, a 
hacker can immediately target any node to get back the used expert key. So, 
in this case, Ja, is just added to certain nodes. Before the implementation 
stage, add |, / nodes to make it less likely that an attacker would be able to 
retrieve a volatile key without taking away from its short lifespan. Because 
each node already has private and public data, it needs to transfer the bio- 
metric couple key; this goal can be achieved before installation begins. 

If Ja, is placed into networks shielded from attackers, this strategy 
works well. Ju can be incorporated into barrier sensors through an attack. 
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Even though an opponent manipulates and gives up the cluster from the 
network, the probability of obtaining a node containing a special key is 
la/M, where la is the number of networks with expert keys. 

A random encrypted key is generated for the user XR, when he or she 
registers with the system. The generated key would be stored on the Access 
point as a key for XR, A Biometric Encrypted pattern is created by user 
key with connection nodes and extracting the properties of the user iden- 
tity from the transform. Random encrypted is used for validation and to 
generate a key from the fingerprint of the individual from XR,. When using 
a piece of the fingerprint pattern for unencrypted biometric encoding, the 
pattern is massive; a pseudorandom function was used to generate results 
of variable size from inputs of a set size. In addition, each node has a ran- 
dom value saved in the Biometric Encryption template that can decline 
invalid keys at the beginning of the authentication process. 

The wavelet transform converts a physical domain to a spectral domain. 
It is a common misconception that wavelet transforms on a pixel-by- 
pixel basis separate low- and high-frequency information. First do a one- 
dimensional transformation on each row, and then do the same for each 
column in a two-dimensional change. 


1.3.2 Clustering 


Clustering is based on the construction process for groups. In the begin- 
ning, a hijacked node can present information to a node’s portion in a 
cluster while avoiding distribution to other devices in the network. The 
network structure separates a cluster into multiple sub-clusters, which 
minimizes the cluster’s bulk data. The number of nodes in a cluster affects 
the probability that a corrupted node would be selected as the Clustering 
Head by chance. Let’s imagine that there is indeed a group with a few par- 
ticipants and a future cluster with additional participants, each of which 
has damaged nodes. A team leader and susceptible nodes can help with the 
selection process. In these cases, a cluster with a damaged node as a group 
head is anticipated. The grouping key is distributed throughout collections 
for group-to-group interaction. Each device in the network receives a bio- 
metric feature array from the cluster head. The data collected from the 
biometric feature array is divided into N clusters, and each data item is par- 
tially assigned to each group using the c-means data clustering method. As 
an illustration, a data view’s degree of participation in a collection increases 
the closer it is to its center. In contrast, a data point’s degree of participation 
decreases the further it is from the cluster’s center. C-means clustering is 
executed using the M_ Function. The starting point is a random estimate 
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where the cluster centers and the average position of each cluster are deter- 
mined. Following this, M, gives each data point a random membership 
grade in each cluster. M_ realigns the cluster centers within a data set and 
calculates the level of participation in each cluster for every data point by 
dynamically adjusting the cluster centers and the involvement grades. To 
begin cluster construction, a cluster head node emits a clustering signal. 
Biometric characteristic vectors and the signal type are verified using pri- 
vate keys to prevent malicious activities. In Figure 1.4, x,,, X55 Xj,3 «+++ Xp, 
represent the level of consistency of different users, and their biometric 
characteristics are fed to group keying in which the various DOS attacks 
are prevented in the sensor network. 

If a node receives information from several cluster member networks, 
the recipient rejects the other information and links to the first one that 
arrives. Nodes verify their keys if they get a cluster member message. It 
joins the group and transmits the signal if it is verified correctly. The cluster 
member messages provide information on a cluster’s primary key. Sending 
a signal simply adds the signal type and cluster message node’ identity 
to the statement. The cluster head and its members interact with the sink 
node in the cluster head. The information channel between the cluster 
members and the group head is exposed. 

Let = {x }, a=1,.....M_, M_is the number of nodes in the Sensor Network 
configuration specified in the cluster. As n-dimensional vectors are repre- 
sented as @ = {at,cA',b=1,...a} and reflect the clusters generated by nodes 
in A. For the sake of network configuration, let the vector is represented as 
M « nwith X,, where a,b indicates the level of consistency of A, with the 
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Figure 1.4 The clustering stage and keying process to protect from different attacks. 
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group indicated by DK,. Finally, let x; =(xp1,...... Xbn) be the b-th row of 
S that contains the level of consistency of x,, for each cluster. The level of 
consistency and the time taken for processing is shown below: 


XE (01), 2 = Lvews Mi b= 1, nan (1.5) 


The level of consistency X,, and the time taken for processing the 
encryption is obtained from Equation (1.5); here, a, b are represented as 
different groups of vectors. M, m represent the network configuration. 

Nodes send a packet to their entire network to identify their neigh- 
bors and use it as a biometric-based asking communication to commence 
the keying step after deployment is complete. An encoded group and a 
sequence number are included in the packet delivered to every node, and 
the energy usage for each node and the cluster head is calculated. In the 
storage stage, nodes in the network store acquired requests, their respective 
identifiers, and the encoded information that goes with them. Employing 
biometric characteristics and dynamic variables, a network mx, starts the 
construction of a physiological pairing key together with its companions. 
It is possible to keep the life duration of a volatile key to a minimum by 
employing this technique. 

A biometrically verified telecast transmission prevents an offender from 
making a misleading bilateral demand. Otherwise, the node would wait 
until it receives a request from a neighboring node. Once la, has been 
obtained by the node mx, it can start a process of encrypting and decrypt- 
ing data with either one of its friends, such as mx, and then transmits to 
mx.,. In a simple style, the request includes its identification and selected 
group. As a result, a similar key is generated from every set of adjacent 
nodes based on the circumstances described above. The attacks in the sen- 
sor network have been protected by disclosing the shared key between mx, 
and mx, is shown below: 


S_,= 5, attacks (mx, U la, mx,) (1.6) 


The protection stage of networks from different attacks is obtained from 
Equation (1.6); here mx,, mx, different keys in the nodes, la, represent the 
request from the neighbor node. The use of a fuzzy membership criterion, 
in this case, is intended to increase safety. The membership value generates 
new prime numbers before performing the collision operation. A rectan- 
gular association value is used to generate new unique quantities. C-means 
clustering provides superior results compared to other algorithms, 
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especially for overlapping data sets. In contrast to k-means, every other 
data point should correspond to exactly one cluster center; in this case, 
data points are awarded membership to all cluster centers. The BBUCAF 
increases the security level with the increase in the speed of the network 
with less network traffic and protection against various attacks. 


1.4 Experimental Analysis 


Modeling a network with many sensors is used to evaluate the systems pri- 
vacy and efficiency. To measure the performance of various methods, a few 
sensor networks to overcome the attacks are used with high-speed perfor- 
mance with less traffic. The network’s performance is increased using less 
traffic among the networks and protection against attacks; the evaluation 
of the network in terms of traffic is measured in the form of latency. The 
performance of traffic among the network is shown in Figure 1.5. 

Computational calculations show that the network's activity is imple- 
mented in this simulation scenario. For example, parameters such as the 
percentage of assaults detected and the time it takes to respond to well- 
known sensor network attacks can be calculated. 
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Figure 1.5 Latency of BBUCAF 
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Figure 1.6 Computational time of BBUCAE. 


Figure 1.6 shows the outcomes of an analysis of various strategies con- 
cerning the operating duration of the system under different node counts. 
The suggested BBUCAF system takes less time to execute than alternative 
approaches. Biometric key authentication is used to secure more commu- 
nication, resulting in a running time of a few milliseconds for different 
nodes. 

Eu is used to calculating the total energy usage of the network as shown 
below. 


Eu = (1+ 1) * m(12) (1.7) 


The total energy consumption is obtained from Equation (1.7), shown 
in Figure 1.7; here, | represents the sensor networks that help send m mes- 
sages between nodes. As a result, the efficiency of energy use is directly 
related to the amount of time spent concentrating, the size of the packet 
sent, and the amount of time spent decrypting and encrypting data. 

With the delay, one may determine the typical end-to-end lag experi- 
enced by data packets as they travel across networks. The term “end-to-end 
delay” refers to the average amount of time it takes for a packet delivered 
from a resource to reach its target. Trails are used to calculate delay, as 
shown in Figure 1.8. 
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Figure 1.7 Energy usage of BBUCAE 


WZ BBUCAF 


-HCN 


100 


JL LTLLLTLLLLSLLDA, 


KXQ§ SDN 


LLL 


80 


LILI LLL LILI LLL 


LEILA 


60 


LEN 


VLLLILLLLED A 


40 


 LLLLL LLL PLL TLLLLTLPLLL DS 


LLL eed 


20 


 LLALD LLL LLL TL ATL LTLED, 


w fo} w 
N N = 


(%) ayey Aejaq 


40 
35 
30 


Number of Nodes 


Figure 1.8 Delay rate of BBUCAF 


(1.8) 


=, Q,,/m, 


delay 


The delay for the entire network is obtained from Equation (1.8); here 
! represents different sensor networks, m, represent the total number of 


packets collected. 
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Figure 1.9 compares the outcomes of different strategies for attack detec- 
tion based on the number of nodes. Compared to previous approaches, the 
attack detection rate of the proposed BBUCAF system is more advanced. 
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Figure 1.9 Attack detection rate of BBUCAF. 
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Figure 1.10 Computational time BBUCAF. 
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A biometric key authentication method is used to secure more interaction 
since the projected system has a lower attack detection performance than 
the present system. 

To test the effectiveness of different methods, the following notations 
are used for the multiplication time with the hash function and comple- 
tion time. The fuzzy extractor and the computation time relate to multi- 
plying rates in the algebraic graph, and the hash function with the cost of a 
unique action may be disregarded bitwise. The BBUCAF uses the cluster- 
ing node to decrease the disparity between the computational loads of the 
center and the nodes and can enhance the efficiency of the performance. 
The computational time of BBUCAF is shown in Figure 1.10. As a result, 
just the expenses of calculating irregular multiplier curve operations and a 
hash function should be considered for detecting the computational cost. 


1.5 Conclusion 


To deal with the security-related problems, Network security and speed 
have been improved with the introduction of a BBUCAF. The fingerprint is 
used to develop a Biometric-based model. The user’s private key is gener- 
ated securely using feature vectors. Sensor nodes get a key in the form of an 
encrypted message. The user’s key is given to each sensor node, and a ran- 
dom number is used to create private keys between them. A fuzzy registra- 
tion component that considers prime integers is used to create a collective 
key. Based on biometrics, data is sent between groups and sensor nodes 
using fuzzy membership and secret keys. The set of group keys used by 
each cluster is distinct from the sets used by other clusters. To increase the 
network's effectiveness, the network’s speed must be increased to reduce 
network traffic with less energy usage and guard against DoS attacks. 
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Abstract 

The innovation of technologies has become ubiquitous and imperative in day-to- 
day lives. Consequently, there has been a massive upsurge in malware evolution, 
which generates a substantial security hazard to organizations and individuals. 
This advancement in the competencies of malware opens new cybersecurity 
research dimensions in malware detection. It is quite impossible for anti-virus 
applications using traditional signature-based methods to find novel malware that 
incurs high overhead with respect to memory and time. This is because malware 
developers explore new methodologies to avoid these traditional malware defense 
approaches. To solve the problem, machine learning algorithms are used to learn 
the distinctions between malware and benign apps automatically. Unfortunately, 
traditional machine learning approaches that are constructed on handmade fea- 
tures are rather ineffective against these elusive practices and need more efforts 
owing to feature-engineering. To overcome such limitations, this work proposes 
a well-defined malware detection system called DeepNet based on deep learning 
techniques. In this work, we focus on the application of deep learning frameworks 
for malware detection by evaluating their effectiveness when malware is repre- 
sented by high-level and low-level features, respectively. In this paper, two deep 
learning models, Stacked Autoencoder (SAE) and Deep Belief Networks (DBN) 
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with Restricted Boltzmann Machine (RBM) are utilized to extract better features. 
SoftMax and Deep Neural Networks (DNN) classifiers are utilized in the malware 
discovery and classification. Comprehensive experiments are achieved on four 
benchmark malware datasets namely, Malimg dataset, BIG 2015 dataset, MaleVis 
dataset, and Malicia dataset. The outcomes implies that the proposed hybrid archi- 
tecture can sense new malware trials with improved correctness and minimal false 
positive rates in comparison with the conservative malware prediction systems 
while preserving least computational time. The proposed hybrid framework is also 
unfailing and operative against complication outbreaks in malware recognition. 


Keywords: Malware detection, deep learning, attribute reduction, feature 
engineering, dimensionality reduction 


2.1 Introduction 


Modern computer technology and the Internet have made life simpler 
and easier for people. Nowadays, anything can be done online, including 
social contact, financial transactions, tracking changes in the human body, 
etc. These kinds of advancements tempt cybercriminals to commit crimes 
online rather than in the actual world [1]. Recent scientific and commer- 
cial publications estimate that cyberattacks cost the global economy tril- 
lions of dollars. Malware is a common tool used by online criminals to 
start attacks. Any software known as malware engages in unauthorized 
and suspicious actions on the computers of its victims. The different vari- 
eties of malware include viruses, worms, Trojan horses, and ransomware 
which can steal sensitive information, launch assaults, and cause havoc to 
computer systems [2]. The latest malware iterations hide themselves on 
the victim’s system by encrypting data and stuffing it. These novel varieties 
propagate by using people's trust as a vehicle for infection. For instance, 
well-known malware transmission vectors are launched through email 
attachments, viewing, and downloading files from bogus websites. To keep 
computer systems safe, we must identify malicious software as soon as it 
affects the systems. Malware classification is the process of examining and 
locating files to determine whether they are malicious or non-harmful [3]. 

Machine learning technologies along with cloud computing and block 
chain are all used in these procedures to boost the detection rate. Using 
the methods and tools, there are various malware detection strategies. 
The key methods used here include memory-based and model detection, 
model validation, behavior, and signature analysis [4]. Depending on the 
methods and tools employed, various approaches have different names. 
The use of a signature-based strategy works well against known and related 
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malware variants. However, it is unable to find malware that has not yet 
been seen. Although the other detection systems can successfully iden- 
tify some unknown malware components, they fall short when it comes 
to identifying sophisticated malware variants that employ packaging and 
concealment tactics [5]. 

The inadequacies of current malware recognition technologies have been 
solved in recent times using a deep learning—-based approach. Numerous 
fields, including image processing, NLP, human action, and facial recogni- 
tion have made substantial use of deep learning [6]. However, deep learning 
has got more roles in the field of cybersecurity, particularly in malware rec- 
ognition. Artificial neural networks are the foundation of the subset of arti- 
ficial intelligence known as deep learning. Deep learning learns from past 
instances and employs numerous hidden layers. A variety of deep learning 
architectures, including deep neural networks (DNN), recurrent neural 
networks (RNN), convolutional neural networks (CNN) and deep belief 
networks (DBN), have been employed to improve model performance [7]. 

In this work, a unique hybrid deep learning approach, DeepNet, has 
been proposed for classifying malware. The proposed model is trained 
using three different datasets such as Malimg, MaleVis and BIG 2015. The 
input malware images in these datasets are initially converted as binaries 
before it is fed to the training process. After conversion, the bit/byte level 
sequences are forwarded to the feature extraction process. Stacked Auto 
Encoders (SAE) along with Deep Belief Networks (DBN) and Restricted 
Boltzmann Machine (RBM) are used for extracting the features. The mal- 
ware classification is implemented using the softmax classifier. According to 
the test results, the suggested method may successfully extract distinguish- 
ing characteristics for each type and family of malware in order to classify it. 
The findings of the experiment also demonstrated that the suggested deep 
learning algorithm classifies several malware variants with excellent accu- 
racy, outperforming the most recent methods described in the literature. 

The novelty of the proposed work is as per the following: 


i) A powerful and quick DL-based malware acknowledg- 
ment framework utilizing crude parallels while requir- 
ing no paired execution (conduct investigation), picking 
apart, or code dismantling language abilities is given. 

ii) The proposed crossover model utilizes pretrained 
Profoundly Associated DBN with RBM and SAE 
(DeepNet) to accomplish quicker preprocessing and pre- 
paring of parallel examples. The DeepNet model licenses 
for combination of highlights and uses less boundaries 


24 WIRELESS COMMUNICATION FOR CYBERSECURITY 


contrasted with other DL models. The selective profound 
management component of the DeepNet model gives to 
successful malware disclosure. Moreover, the profound 
associations with its normalizing power help decrease 
overfitting with diminished malware preparing tests. 

iii) ‘The issue of information lopsidedness in sorting malware 
is attempted by reweighting of the class-adjusted unmiti- 
gated cross-entropy misfortune capability in the softmax 
layer. 

iv) We direct a broad assessment on four different malware 
datasets, of which three datasets are utilized for preparing 
and one dataset is utilized for testing the proposed model. 
The results show that the proposed structure is extremely 
strong and capable. It is additionally vigorous against 
modern malware improvement over the long run and in 
consistency to hostile to malware avoidance strategies. 

v) The proposed mixture system accomplishes higher preci- 
sion paces of 98.7%, 98.5%, and 98.2% for the three data- 
sets and of 90.2% for the concealed (Malicia) dataset. The 
model achieves expanded computational execution with 
diminished reality intricacy, in this way achieving a useful 
malware acknowledgment framework. 


The rest of this paper is coordinated as follows. Segment 2.2 portrays 
the malware distinguishing proof and grouping techniques examined in 
the writing. Segment 2.3 spotlights the datasets that are accessible for mal- 
ware location. Segment 2.4 shows the profound structures that are reason- 
able for malware recognition. Segment 2.5 depicts the proposed DeepNet 
model with engineering. Segment 2.6 presents the trial consequences of 
the DeepNet model and looks at the outcomes acquired against other AI 
and DL models. Segment 2.7 concludes the current work. 


2.2 Literature Survey 


Malware examination can be arranged into static and dynamic sorts in 
two principal bunches [8]. Both manual and mechanized investigation 
is conceivable. While programmed investigation requests huge informa- 
tion science programming capacities, manual examination requires sub- 
ject aptitude. Static examination of malware is the initial step followed by 
unique investigation, which comes last. The static investigation recognizes 
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the design of the malware test without running the genuine noxious proj- 
ects. The reason and usefulness of malware are uncovered through cutting 
edge static investigation, which calls for profoundly specific information 
on working framework ideas and get together code directions. 

Dynamic examination includes running projects and dissecting mal- 
ware’s exercises. Contrasted with static examination, dynamic investiga- 
tion more definitively portrays the genuine capacities of malware. The 
two classes of dynamic examination are essential powerful investigation 
and high-level unique investigation. Essential unique examination utilizes 
observing instruments while cutting-edge dynamic investigation utilizes 
troubleshooting apparatuses to look at the exercises of the malware. 

The crossover examination came into execution to defeat the disadvan- 
tages of static and dynamic investigation. Static examination will check for 
application and client authorization, and wary code though powerful inves- 
tigation checks for the way of behaving of the application. In this strategy, 
to improve malware examination, dissecting any malevolent code’s mark 
and joining it with other standard of conduct factors. 

Without the user’s knowledge, a computer hacker will send malware, 
open a loophole, and start processing bitcoin, a source of cryptocurrency, 
on the user’s system. Malware that resides on the hard drive and runs in 
memory is either not verified, or there is a strong possibility that the mal- 
ware’s signature and behavioral pattern [9] differ from the malware that 
resides on the hard drive and runs in memory. 


2.2.1 MLor Metaheuristic Methods for Malware Detection 


High-performance RE, KNN, and AdaBoost with prior research indicating 
successful RF and KNN mobile malware detection. In the signature-based 
malware detection method, in which signatures are extracted and com- 
pared, and based on the comparison they are classified as malicious. In 
automatic malware detection [10], the string signatures were automatically 
retrieved using a variety of library recognition methods and diversity-based 
criteria. The application contains various cryptographic hash-based signa- 
tures in accordance with the tamper-evident architecture. These signatures 
allow for the detection of Trojans hidden within the hardware. 

There are certain drawbacks of Machine learning algorithms used in 
detecting malwares [11]. 


1. Adaboost has good new discovery detection capabilities and 
gave malware version 8 a perfect score. 
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2. Due to the increased noise in our dataset, using real- 
world data for model training may have resulted in worse 
performance. 

3. When using Machine Learning algorithms like KNN, Bayes 
Network and so on, the False-negative rate (malicious 
detected as malicious) is lower than the false-positive rate 
(benign wrongly detected as malicious). 

4. If the size of the dataset is smaller then AdaBoost will give a 
good accuracy rate in finding Ransomware. 

5. For tracking real-time, malicious, or suspicious data which 
requires a long time when the KNN algorithm is used, as 
smartphones will have less computation power; hence KNN 
is not applicable from mobile phones. 

6. The inadequate training instances may consequently lower 
the performance on the identification of Spyware SMS and 
Adware. 

7. Fl scores that are lower than those of other cutting-edge 
dynamic malware detectors. 

8. False Negative rate is higher, which means most of the suspi- 
cious actions are not detected. 


Most ML calculations give great precision, yet that relies upon the data- 
set and how the model is prepared to find the malware. Metaheuristic 
calculations are utilized for distinguishing malware that have profoundly 
connected highlights, not every one of the elements an enhanced meth- 
odology. Yet, this ML or Nature-propelled calculation isn’t effective for 
continuous dataset or for cell phones; consequently, profound advancing 
should be carried out for taking care of a significant number of the dataset 
and the framework with less calculation power. 


2.2.2 Deep Learning Algorithms for Malware Detection 


A variant of machine learning called “deep learning” learns the input at 
many levels to provide improved knowledge representations. Convolutional 
Neural Networks (CNN) have been developed to advance computer vision 
through deep learning. Deep learning models train a complicated model 
with numerous convolutional layers and millions of parameters by learn- 
ing complex features. 

Regarding the amount of time and machine configuration needed for 
the experiment, LSTM [12] was the right approach. The LSTM layer is suc- 
cessively given the hexadecimal samples of clean wares and malware that 
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have been transformed to numerical values by the input layer. Our model 
employs a stateful LSTM, which aids in identifying relationships among 
the text sequences that were taken from the dataset files’ hex dump. About 
128 neurons make up the LSTM layer, which oversees taking in inputs 
from the layer above and producing outputs using a linear activation func- 
tion. Following multiple experiments, the network’s features, time steps, 
and the number of neurons were decided upon. 

Like an LSTM, Gated Recurrent Unit (GRU) [13], a potent variation of a 
conventional Recurrent Neural Network (RNN), uses the integrated gating 
mechanism as a short-term memory solution. A system inside the GRU 
called gates controls and even circulates the information flow. The gates 
assist the GRU cell in learning which information is crucial to store or 
remove. As a result, crucial information is passed on to enable prediction. 

Convolutional neural networks play a major role in cyber security. 
Compared to traditional feature selection algorithms, CNN [14] can 
automatically learn the crucial features. CNN is viewed as a series of con- 
nected processing elements created with the goal of converting a set of 
inputs into a set of desired outputs. Convolution, pooling, flattening, and 
padding are just a few of the operations that CNN runs on the input data 
before connecting to a fully connected neural network. The CNN architec- 
ture’s performance depends on its capacity to recognize and combine local 
input patterns in a parameter-effective manner. The CNN analyses an app’s 
opcodes as text to be mined for malware-indicating patterns, concentrat- 
ing on extracting n-gram characteristics from these sequences. 

Conduct-based DL structure comprise Stacked Auto Encoders (SAE) 
[15] one of the most upgraded profound gaining calculations for malware 
discovery that takes criticism from conduct diagrams. The cloud stage 
(CP) and internet of things (IoT) climate (IoTE) modules assume an essen- 
tial part in BDLE The far-off PCs and other smart gadgets that make up the 
IoTE module communicate examining information or dubious recently 
introduced records to the focal handling unit (CP) and find solutions from 
the CP. The errand of recognizing examining information or records sent 
from IoTE falls on the locators in CP. For checking information, CP assem- 
bles conduct diagrams, changes over the Programming interface call charts 
into parallel vectors, and afterward takes care of the twofold vectors into 
SAEs models for malware recognizable proof. CP runs tests in the Cuckoo 
Sandbox and afterward pulls Programming interface calls from the check- 
ing documents of the sandbox for any dubious records. From that point 
forward, CP handles the observing information similarly to how it han- 
dles the filtering information. Following the location, CP illuminates IoTE. 
Programming interface call diagrams are made with the goal that they can 
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consolidate Programming interface solicitations to learn perilous way of 
behaving. 

A quick DL-based malware acknowledgment technique utilizing 
crude twofold snaps is presented by Thickly Associated Convolutional 
Organizations (DenseNet) [16], which includes no information on figur- 
ing out, parallel execution, or code dismantling. In contrast with past CNN 
models, the DenseNet model purposes less boundaries and grants the con- 
nection of highlights. Improved malware recognition is worked with by the 
DenseNet model’s implicit profound oversight system. Furthermore, with 
decreased malware preparing datasets, the thick associations’ regularizing 
limit lessens overfitting. 


2.3. Malware Datasets 


2.3.1 Android Malware Dataset 


CICMalDroid 2020 [17], the data consist of 17,341 samples (from 2017 
to 2018), the source includes the Contagio security blog, Virus Total ser- 
vice, MalDoz, AMD, and from various dataset used for cyber research. 
It is important for cybersecurity experts to categorise Android apps as 
malware in order to implement effective mitigation and countermeasure 
procedures. Therefore, we purposefully divided our dataset into five dif- 
ferent categories: Banking malware, SMS malware, Adware, Riskware, and 
Benign. 


2.3.2 SOREL-20M Dataset 


SOREL-20M [18] makes an overall or partial attempt to address these 
problems. By offering orders of magnitude more data for analysis, we 
address the problem of training size. Internally, we have discovered that, 
although performance becomes better with bigger datasets, establishing 
a stable rank order amongst models and evaluating performance with 
fewer false positives only requires validation sizes of about 3 to 4 million 
cases. We get 12,699,013 training samples, 2,495,822 validation samples, 
and 4,195,042 test samples when our suggested time divides are applied 
to create the training, validation, and test sets, respectively. LightGBM 
and a PyTorch-based feed-forward neural network (FFNN) model using 
SOREL-20M. Although both models perform well, there is still much 
opportunity for improvement, especially at lower false positive rates. As a 
result, SOREL-20M should be more valuable to contrast various malware 
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detection strategies. In addition, we present benchmarks for employing 
a multi-target model and a variety of extra targets that define behaviors 
inferred from vendor labels. 


2.4 Deep Learning Architecture 


The basic deep learning architectures suitable for malware detection are 
described briefly in this section. 


2.4.1 Deep Neural Networks (DNN) 


A conventional artificial neural network with numerous interconnected 
layers between the input and output layers is known as a DNN [19]. In 
order to transform the input into the output, the DNN determines the 
appropriate mathematical computation. Numerous neurons in the sin- 
gle layer of the DNN are where computations are done. The node accepts 
input, processes it using stored weights, applies an activation function, and 
then passes the results to the next node in line until a conclusion is reached 
as shown in Figure 2.1. 


2.4.2 Convolutional Neural Networks (CNN) 


CNN is presently thriving in the realm of cyber security after attaining 
remarkable results in the disciplines of image recognition, audio recogni- 
tion and computer vision [25]. Compared to traditional feature selection 
algorithms, CNN can automatically learn the crucial features [9]. CNN 
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Figure 2.1 Architecture of DNN. 
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is viewed as a series of linked processing elements designed to convert a 
set of inputs into a set of desired outputs. The three primary parts of a 
CNN classifier are input, output, and hidden layers as shown in Figure 
2.2. Convolution, pooling, flattening, and padding are the fundamental 
operations that CNN runs on the input data before connecting to a fully 
connected neural network. 


2.4.3 Recurrent Neural Networks (RNN) 


The RNN variation known as Long Short-Term Memory (LSTM) is a 
remarkable classifier to learn and mine temporal data. In order to learn 
long-term characteristics and relationships, the LSTM model makes use of 
an exceptional module [10]. Additionally, using different “gate” structures 
reduces reverse propagation of error. The “gate” state, which controls the 
data stream and memory, determines the internal values of the exceptional 
module based on the information currently available and prior flows. 
There are three gates present in each LSTM cell such as Input gate, forget 
gate, output gate. Additionally, two different states are also available which 
are hidden states and the cell states as shown in Figure 2.3. 


2.4.4 Deep Belief Networks (DBN) 


A DBN is a Restricted Boltzmann Machine (RBM) that is layered as a 
self-organizing graphical network. RBM is an undirected comprehensive 
model in which the modules of the same layer are not connected; only 
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Figure 2.2 Architecture of CNN. 


DyNAMIC MALWARE DETECTION USING DEEP LEARNING 31 


Hidden i || 
Parameters 


Figure 2.3 Architecture of RNN. 


the layers are related [35]. The construction of a DBN is very straightfor- 
ward; RBMs are layered to create an unsupervised network that treats the 
visible and hidden layers as separate networks. The following layer, and 
so on, considers this hidden layer to be a visible layer. The met discourse 
delineation is stratified for each RBM in each sub network, and the layers 
act as feature extractors. After pretraining, supervised learning is used for 
refining, which enables the DBN to perform binary classification in order 
to determine whether the input features correspond to malicious software 
or a benign application. 


2.4.5 Stacked Autoencoders (SAE) 


A heap of autoencoders acting as hidden layers in a neural network is 
known as a stacked autoencoder. A back propagation mechanism is utilized 
by the stacked autoencoder, an unsupervised machine learning method, to 
project the output value [36]. With noisy autoencoders built into the lay- 
ers, this neural network enhances accuracy in deep learning. The following 
three steps are typically included in stacked autoencoders. 


Stage 1: The information is utilized to prepare the autoencoder, which then 
creates the obtained highlights. 

Stage 2: The accompanying layer involves these elements as an info, etc., 
until the preparation is done. 

Stage 3: Assuming that the secret layer is prepared, the back-engender- 
ing calculation (BP) is utilized to bring down the expense capability and 
revamp the loads by marking preparing information to accomplish wanted 
execution. 
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The proposed work integrates Profound Conviction Organizations with 
RBM and SAE as a cross-breed model to separate wise elements from the 
dataset before sending it for the malware grouping utilizing softmax clas- 
sifiers. The proposed framework utilizes RBM with DBN and SAE over the 
other DL calculations for the following reasons: 


i) ‘The proposed DeepNet model utilizes three datasets with 
various aspects, that thus might influence the functioning 
effectiveness of the proposed framework. In this spot RBM 
assumes the significant part of dimensionality decrease for 
diminishing the quantity of irregular factors to a bunch of 
standard factors in the malware datasets. 

ii) Being a generative model permits DBNs to be utilized in 
either an unaided or a regulated setting. Meaning, DBNs 
in the proposed framework are likewise utilized for high- 
light learning/extraction. Exactly, in highlight learning we 
do layer-by-layer pre-preparing in a solo way on the differ- 
ent RBMs that structure a DBN. 

iii) Each RBM model plays out a non-straight change on input 
vectors and produces as results vectors will act as contri- 
bution for the following RBM model in the sequence. This 
permits a ton adaptability to DBNs and makes them sim- 
pler to extend. 

iv) The SAE is a nonlinear change to find the primary com- 
ponent heading, during the time spent include learning/ 
extraction and DBN depends on the likelihood of disper- 
sion of tests to remove significant level portrayals. 

v) The fundamental components of both scanty autoencoder 
and RBM are different on a basic level. In the prepara- 
tion strategy, the SAE for the most part involves the angle 
plunge technique same as DBN with RBM. ‘The general 
progression of the SAE and DBN preparing is predictable, 
with a layer of preparing. 


2.5 Proposed System 


2.5.1 Datasets Used 


The proposed system considers three datasets, namely Malimg [23], BIG 
2015 [24] and MaleVis [22] datasets. The Malimg dataset contains images of 
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Table 2.1 Datasets used in the proposed system. 


Total Training | Testing 
Dataset | Family/Classes samples samples samples 


Mailmg | Adialer.C ,Agent.FYI, 9339 6437 2115 

[20] Allaple.A , Allaple.L, 
Alueron.genJJ, 
Autorun.K , C2LOP. 
gen!g, C2LOPP, 
Dialplatform.B , 
Dontovo.A, Fakerean, 
Instantaccess, Lloyds. 
AAI, Lloyds.AA2, 
Lloyds.AA , Lolida. 
AT, Malex.genJJ, 
Obfuscator.AD, 
Rbot!gen , Skintrim.N, 
Swizzor.gen!E, Swizzor. 
genlI, VB.AT, Wintrim. 
BX, Yuner.A 

] 


IG 


B Ramnit, Lollipop, 21741 8338 3573 
2015 Kelihos_ver3, Vundo, 
[21 Simda, Tracur, Kelihos_ 
verl, Obfuscator. ACY, 
Gatak 


MaleVis | Vilsel, VBKrypt, VBA/ 14226 9958 4268 

[22] Helium.A, Stantinko, 
Snarasite.D!tr, Sality, 
Regrun.A, Neshta, 
Neoreklami, MultiPlug, 
InstallCore.C, Injector, 
Hlux!IK, HackKMS.A, 
Fasong, Expiro-H, 
Elex, Dinwod!rfn, 
BrowseFox, 
AutoRun-PU, Androm, 
Amonetize, Allaple.A, 
Agent-fyi, Adposhel, 
AutoRun-PU, Androm, 
Amonetize, Allaple.A, 
Agent-fyi, Adposhel 
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9,339 malware, each of them belonging to 25 families; the dataset is helpful 
for malware classification in terms of multiple classes. The BIG 2015 dataset 
containing the malware binary samples of about 21,741 was introduced by 
Microsoft for malware classification, each of which represents 9 divergent 
families. The MaleVis malware dataset is an image dataset generated from 
25 malware and one benign software classes, applicable for vision-based 
malware identification and it is specially designed for implementing deep 
learning architectures. The dataset comprises malware images of around 
14,226 RGB, each of them belonging to the 26 classes. The malware images 
in the above datasets are converted into binaries, in order to avoid ambi- 
guity among the input variables that are passed into the proposed DeepNet 
architecture for further processing. Table 2.1 Represents the training and 
testing samples for all the three datasets. 


2.5.2 System Architecture 
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Figure 2.4 Overall design & flow of the proposed DeepNet model. 
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2.5.3 Data Preprocessing 


Data preprocessing is the process of eliminating noise, missing values, and 
inconsistent data found in the dataset. The raw data is cleaned and then 
the data is transformed into a format suitable for further processing. The 
proposed system considers three datasets (refer to Table 2.1) that contains 
malware images, which are converted into malware binaries and then fed 
as input to the DeepNet algorithm. Initially the malware images are inter- 
preted as 2D matrix, which is then converted into 8-bit vectors ranging 
from 0 to 255, followed by the conversion of vectors in binaries in the form 
of 1’s and 0’s. The proposed DeepNet model can handle RGB/gray scale 
images. But the main reason for binary conversion is that binaries simplify 
the algorithm and reduce computational requirements. However, process- 
ing RGB layers is more complex, Hence the converted malware binaries 
are used for further processing as shown in Figure 2.5. In a similar way, the 
training samples of all the three datasets are converted as binary files and 
are carried forward for the feature extraction process. 


2.5.4 Proposed Methodology 


The complete scheme of the anticipated malware recognition method is 
illustrated in Figure 2.4. The emergence of the anticipated amendment of 
DeepNet representation with DBN and SAE layers is shown in Figure 2.5 
and algorithm 1. The key twofold images are nourished into the DeepNet 
representation for attribute mining and categorisation. The representa- 
tion is accomplished through offering the twofold images precisely into 
the DBN and SAE layers. The anticipated DeepNet representation with 
DBN [34] and SAE [34] layers has a boundless ability to mine distin- 
guishing attributes that broadly express the figure and study task-specific 
attributes. 

They spontaneously study the attributes at numerous stages of extraction, 
permitting them to study dense purposes through demonstrating primal 
key information into the anticipated outcome. The anticipated prototype 
utilises DeepNet to mine the whole attributes from malware datasets then 
prepares the DeepNet on maximum of the mined attributes. Each deep 
level can mine good particulars from twofold images. The turnout attribute 
records obtained subsequently going around these levels are provided as a 
key for a fully connected (FC) level. The FC level categorises the malware 
trails into their related categories. 
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Algorithm 1. DeepNet algorithm 


Input: Twofold image trials 
Output: Precise class Ci 


a. Convert binaries to 2D range grayscale imageries to the array I. 

b. Prepare the prototype. 

c. Mine untrained attributes from the key trials. 

d. Arbitrary loading of key training trials. 

e. Standardized training trials are provided to DeepNet network (DBN + 
SAE) and the number of the primary neural elements is established as 
the number of attributes in the key training trials. 

f. Reiterate the procedure till the DeepNet is prepared to take on the 
requirements of repetition or the divergence state. 

g. Connect every level through joining the attribute plots of entire former 
levels. 

h. Categorise the key trials into their equivalent categories utilising a soft- 
max classifier. 


Allaple.A 
Zz Taree Data pre-processing 


Malware images 8 bit vectors Malware images 
are converted into converted into are converted into 
8 bit vectors malware binaries malware binaries 


Malware image 


Feature Extractor 


Stacked Deep Belief 
Autoencoder Networks (DBN) 
(SAE) with RBN 
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Output: Ramnit, Lollopop, 
Kelihos_ver3, Vundo, Simda, Tracur 


Figure 2.5 Architecture of the proposed DeepNet methodology. 
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2.5.5 DeepNet 


DeepNet is a DL model in which whole levels are totally connected, in this 
way achieving a productive information stream among them. Each level 
acquires an additional inclusion from the entire going before levels then, 
at that point, moves its trait plots to the whole succeeding levels. The turn- 
out trait plots accomplished from the current level are combined with the 
first level through a chain. Each layer is associated with every one of the 
ensuing layers of the organization, and consequently authored as DeepNet. 
This model requires lesser cutoff points than traditional profound learn- 
ing models. It additionally diminishes the overfitting downside that occurs 
with minor malware preparing gatherings. 

For example, pondered on top, the information developments inside 
malicious applications could shift impressively from the friendly applica- 
tions. DeepNet impacts such changes and similarities to suddenly perceive 
unique applications regardless of whether they are pernicious by using a 
DL model. Regular ML techniques, for example, Bayesian, SVM, MLP, and 
so on, ordinarily have less than three degrees of computation components. 
DL model, totally via its term shows, has a profound foundational layout 
including in excess of three hid layers. Its goals at producing a scholarly 
layered showing of the critical data to create useful properties for ordinary 
ML procedures. Each level in the level examinations an extra rundown and 
composite property of the data. General profound foundational layouts 
incorporate CNN, Scanty Coding, RBM, DBN, RNN, SAE, and so forth. In 
our examination, we picked DBN and SAE to build DeepNet. 


2.5.6 DBN 


DBN isa sort of DNN, comprising various degrees of covered factors called 
RBM, through joins among the levels among components inside each level. 
While zeroing in on a gathering of two-crease key picture preliminaries 
in an unaided means, the RBM levels in a DBN proceed as trait markers 
to concentrate on probability restoration of the property headings, which 
dynamically develop undeniable level portrayals. Hence, relating to the 
error among the key quality headings and the rebuilt bearings, meanings 
in the DBN are adjusted in a solo way. Later in the concentrating on stage, 
the DBN can be moreover achieved with sorted application preliminaries 
in a regulated means to execute categorisation. A back spread is precisely 
guided for calibrating to advance accuracy. In this mode, the DBN por- 
trayal is completely built. 
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Figure 2.6 Structure of DBN model. 


Our malware acknowledgment technique DeepNet is executed on the 
DBN engineering above, as displayed in Figure 2.6. Consequently, assem- 
bling and normalizing the property bearings as portrayed in the pre- 
handling section, DeepNet enters them into the DBN with RBM layers 
for dimensionality decreasing and characteristic mining. The DBN model 
is structured as stages shown in Figure 2.6. As this implies, DeepNet can 
ensure its accuracy in finding the ceaselessly developing new malware. 


2.5.7 SAE 


The framework of our expected plan applying SAE is introduced in Figure 
2.7. The pre-handled credits are given as keys to the SAEs. There are 3 to 4 
levels in the expected SAEs model. Relating to the plan of the framework 
level, the arrangement starts from level 1 autoencoder (AE) of the SAE 
model. We store the coordinating variables of level 1 AE to give a gainful 
essential impact for the readiness of level 2. When the level 2 frameworks 
are ready, the interesting data is at first placed to the level 1 framework 
to acquire the record of level 2. Correspondingly, the levels 3 and 4 are 
arranged unmistakably when the level 2 framework is ready. 

When each level is arranged particularly in the framework, the stream- 
lining activity assigned is Adam enhancer [33]. Towards advancing the 
exhibition of the model, a flowing strategy is used to improve the frame- 
work factors. Flowing the whole levels all in all cause a clever framework. 
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Figure 2.7 Structure of SAE model. 


Right now, the yield of the underlying level is recovered. As the level 1 of 
the outpouring framework setup, the further levels are reimagined, then 
the elements like the impacts ready in each level are circulated. The pro- 
posed framework’s exhibition affects the stacking levels of the AEs, which 
thusly corrupts in the event of stacking sequential degrees of AEs in the 
SAE model. Thus, considering the powerful contributions from the mal- 
ware datasets the stacking levels might increment or decrement. 

Later the whole model is ready, and we gain the finish of the SAE 
acknowledgment model. In the tertiary stage, we present our model tuning 
elements and fall in the getting ready technique. Our model is a brain net- 
work made out of AEs that get ready multi-facet frameworks level through 
level, that readies the complexity center of each and every level by means 
of AE toward the path from foremost to back. The yield of the last level is 
held in the job of key trait of the softmax classifier, and the categorisation 
results are yielded because of softmax. 


2.5.8 Categorisation 


The categorisation level is comprised of a completely associated (FC) 
SoftMax level. In FC, the number of neurons is fixed relating to the num- 
ber of malware classifications introduced in the dataset. The SoftMax work 
is used for marking multi-class arrangement troubles. This works figures 
the probability divisions of each classification I upon the whole attainable 
classes. 
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The class disparity trouble is a categorisation struggle in which the divi- 
sion of classes in the getting ready dataset is unpredictable. The grade of 
classification disparity varies; a vital imbalance is further difficult to show 
and requires progressed techniques to deal with the issue. The Malimg 
dataset and the Microsoft Enormous 2015 dataset are ridiculous and 
broad - followed malware datasets that cover additional preliminaries 
for the minority classes and especially restricted preliminaries in specific 
classifications. Portrayals arranged on these blended preliminary aspects 
are impacted by principal sorts. To decide the issue of data disparity, data 
extension techniques like oversampling of more modest arranges or down 
inspecting of standard sorts are not reasonable for malware distinguishing 
proof inconveniences. It isn't likely to create portrayals identical to prag- 
matic malware doubles through oversampling. Various decisive malware 
options might be likely unseen by down inspecting. 

Reweighting misfortunes through upset class event normally brings 
about down execution on useful data including a prevalent classification 
imbalance. The expected malware acknowledgment model uses class- 
adjusted misfortune [32] and uses a weighting component Wi, that has a 
reverse extent to the number of preliminaries for class I. 


2.6 Result and Analysis 


The dataset was for arbitrary reasons isolated into 70% preparation and 
30% approval sets. The results were saved with 1,043 clean product prelim- 
inaries and every one of the three malware datasets. Train and test records 
were isolated to such an extent that 30% of the total preliminaries were read 
up for investigation conclusions. The expected malware acknowledgment 
structure was prepared on 6,437 preliminaries and tried on 2,115 prelim- 
inaries for the Malimg dataset with clean product preliminaries (9339 + 
1043). By then, the model was prepared on 8.338 preliminaries and tried 
on 3.573 preliminaries from the Large 2015 dataset alongside clean prod- 
uct preliminaries (10,868 + 1043). On the MaleVis dataset, 9958 prelimi- 
naries were preparing preliminaries and 4268 were trying preliminaries. 

The detailed trials were executed on a Linux framework with Intel® 
Xeon(R) central processor E3-1226 v3 at 3.30 GHz_4, 32 GB Slam, and 
NVIDIA GM107GL Quadro K2200/PClIe/SSE2. The execution valuations 
were taken out with the succeeding hyper parameter settings: 100 ages, 
learning rate 0.0001, and clump size 32. The expected profound brain net- 
work model was executed on the Python structure utilizing the TensorFlow 
Python library [26] and Keras v0.1.1 DL library. 
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There are four sorts of measurement systems estimated to evaluate class 
likelihoods. 


a. True Positive (TP): the likelihood that a sample fits to a 
group and it ensures fit to that group, i.e., a trial that is cat- 
egorised as malware and is malware. 

b. True Negative (TN): the likelihood that a sample does not 
fit to a group and it does not fit to that group, ie., a trail that 
is categorised as not malware and is not malware. 

c. False Positive (FP): the likelihood that a sample fits to a 
group and it does not fit to that group, ie., a trial that is 
categorised as malware and is not malware. 

d. False Negative (FN): the likelihood that a sample does not 
fit to a group and it does fit to that group, i.e., a trial that is 
categorised as not malware and is malware. 


Accuracy (Acc), Precision (Pr), Recall (Re), and F1 score are the four 
key categorisation systems of measurement. The number of precise likeli- 


hoods partitioned by the total number of likelihoods is known as accurate- 
ness. It is defined as 


Acc = (TP+TN)/(TP+FP+TN+EN) 


Precision is the number of precise definite results partitioned by the 
number of definite results anticipated by the classifier. It is defined as 


Pr = TP/(TP+FP) 


Recall provides the division of appropriately recognized occurrences as 
the definite outcomes of all the definite ones. It is given by 


Re = TP/(TP+FN) 


F1 score is the harmonic mean of precision and recall. It influences the 
classifier’s precision along with its strength. It is given by 


F1 score = 2x((precision x recall)/(precision+recall)) 


The evaluation outcomes of conventional approaches for malware rec- 
ognition are shown in Tables 2.2 and 2.3, Figures 2.8 and 2.9, respectively. 
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Table 2.2 Examination of ML-based techniques with the proposed DeepNet model for the three preparation datasets. 
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Table 2.3. Examination of DL-based Teo eee with the proposed DeepNet model for the three preparation datasets. 
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Figure 2.8 Comparison of ML-based techniques with the proposed DeepNet model. 
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Figure 2.9 Comparison of DL-based techniques with the proposed DeepNet model. 


The implementation investigation of the anticipated prototype is assessed 
with numerous ML methods like K-Nearest Neighbor (KNN), Logistic 
Regression (LR), Naive Bayes (NB), SVM, Decision Tree (DT), Random 
Forest (RF), and Adaboost. The malware-based pretrained DL paradigms 
like CNN and its alternatives are utilised for examining the effective- 
ness of the anticipated DeepNet-based malware recognition technique. 
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Table 2.4 Examination of ML and DL models with the proposed DeepNet 
model for the Malicia (inconspicuous) dataset. 


[Methods «Ace [Re score | 
ML Method 
Dep —————*'90.2 090 [090 [090 | 


100 


90 


Mh 


jes) 


N 
Oo 


an 
o 


Percentage 
uw 
Oo 


BS 
Oo 


Ww 
Oo 


N 
Oo 


Oo 


s a ia) ° e _ 
fe} = 5 z ie 2 iss 
8 o 6/5 & 8 & 
oO > > P= c ¥ Y 
Rel o o x = 
< g a ce 
£ G 
a 
ML Methods DL Methods eepNet 


@ Acc(%) | Pr Re & F-Score 


Figure 2.10 Comparison of ML & DL-based techniques with the proposed DeepNet model. 
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The implementation outcomes attained for the anticipated prototype are 
improved than the other traditional malware recognition approached for 
the three training datasets. The anticipated prototype attained an accurate- 
ness of 98.7% for Malimg, of 98.5% for BIG 2015, and of 98.2% for MaleVis 
dataset. 

The over-simplification of the expected procedure is assessed through 
undetected dataset. The dataset is unexperienced by the expected DeepNet 
model to appraise how well it achieves underneath different preliminaries. 
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Figure 2.11 Confusion matrix for the Malimg dataset. 


DyNAMIC MALWARE DETECTION USING DEEP LEARNING 47 


The three achieved malware datasets include totally assorted classes from 
the Malicia dataset gatherings. The assessment of the expected methodolo- 
gies with the ML and DL approaches across the unnoticed Malicia dataset 
is coordinated in Table 2.4 and Figure 2.10, respectively. The results on the 
unnoticed Malicia dataset represent an exactness of 90.2%, which is more 
prominent than the introductions of the ML and DL approaches across the 
refined datasets. 

The confusion matrices for the models prepared on three malware data- 
sets alongside the harmless class are given in Figures 2.11-2.13. For the 
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Figure 2.12 Confusion matrix for the BIG 2015 dataset. 
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Confusion matrix 
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Figure 2.13 Confusion matrix for the MaleVis dataset. 


Malimg dataset with 26 classes, the disarray lattice is a 26 x 26 network 
with the segments addressing the genuine class and the lines demonstrat- 
ing the anticipated class. The inclining components show the quantity of 
accurately ordered examples, where the anticipated class matches the real 
class. The off-inclining components address misclassified tests. The inclin- 
ing components for each of the three datasets show higher qualities con- 
trasted with the off-corner to corner components. Albeit the examples in 
the Simda class are less, the vast majority of the examples in that class were 
accurately ordered by the proposed model. 
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Table 2.5 gives specifics practically the case held for the expected model 
to prepare and assess the preliminaries. The assessment of the expected 
model and the malware identifiers in light of various DL approaches are 
analyzed in marks of computational viability. The results indicate that the 
expected DeepNet-based malware acknowledgment model takes less time 
to prepare and test the assess the malware preliminaries when surveyed to 
other DL-based malware acknowledgment plans. The time and space com- 
plexity for the anticipated DeepNet model is lesser than other existing ML 
and DL-based models, because the proposed system makes use of balanced 
and standardized binary input data instead of imbalanced RGB image data 
as used in the existing ML and DL-based models. 

Table 2.6 surveys the results of the expected malware acknowledgment 
model with going before produces on the four malware datasets (3 prepa- 
ration dataset + 1 [unnoticed] test dataset). The expected model surpasses 
other acknowledgment strategies in the connected works. The accuracy of 
the expected model (98.7%) is possibly more noteworthy than the accu- 
racy of the strategy by Roseline et al. (98.6%) on the Malimg dataset. The 
results of the expected model surpass the overall methodologies on the 
Enormous 2015, MaleVis, and Malicia datasets. 


Table 2.5 Examination of DL-based techniques with the proposed DeepNet 
model based on computational time. 
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Table 2.6 Examination of existing works with the proposed DeepNet model for the four datasets. 
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2.7. Conclusion & Future Work 


There has been broad concentration on malware discovery and group- 
ing yet the capacity to precisely recognize malware variations represents 
a serious risk to network safety. Malware distinguishing proof is a very 
troublesome activity because of code disguise and bundling strategies. To 
precisely recognize malware variations, this work introduced an excep- 
tional profound learning design. The recommended engineering utilizes a 
half and half methodology. At first, datasets, for example, Malimg, MaleVis 
and Enormous 2015 were utilized to secure the malware information. 
The highlights are then recovered utilizing Stacked Auto encoders and 
Profound Conviction Organizations with Confined Boltzmann Machine. 
The malware order in the proposed profound brain network design’s 
preparation stage is then done utilizing a softmax classifier. 

The essential commitment of the suggested strategy is the introduction 
of a crossover model made by ideally combining two profound learn- 
ing designs. The exhibition of the proposed approach is surveyed on the 
Malicia dataset. The ordinary AI and profound learning models appeared 
differently in relation to the proposed half breed model. As per the test dis- 
coveries, the recommended technique effectively classifies malware with 
high accuracy, review, precision, and f-score. What is more, it is noticed 
that the recommended approach is powerful and limits highlight space on 
a wide space. Second, state-of-the-art strategies were utilized to assess the 
proposed model. The results gained here likewise uncover and approve the 
prevalence and benefit of the suggested methodology over other famous 
courses in the conventional strategies. Then again, a little level of malware 
preliminaries probably wont be properly distinguished. This is so because 
such malware renditions portion highlights with other malware classifica- 
tions and utilize complex code camouflage procedures. Later on, work, a 
discovery strategy that especially perceives and orders malware that utili- 
zation disguise methods might be executed. 
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3.1 Introduction 


A network is the collection of connected computers that allows one node 
to share resources, information, and programs with another node. A node 
can be a personal computer, server, hardware, or host. A network may be 
modest, consisting of only one system, or it may be as vast as desired. The 
use of a network is for sharing files, maintaining information, accessing 
the software, using the operating system remotely and resource sharing 
such as printer, scanner, etc. The medium used for communication can be 
guided or unguided, and the type of communication system can be wired 
or wireless. A physical channel such as coaxial cables, twisted wire cables, 
etc., serves as a medium in wired communication and directs the signals 
from one point to another. A guided medium is the one that operates in 
this manner. Examples are wired LAN and Ethernet, etc. Installing a wired 
network is quite expansive since coaxial cable installation takes a lot of 
time and money. So, peer to peer technology is currently employed as an 
option to decrease the costs while simultaneously enhancing network- 
ing and reliability. Therefore, wireless networks are installed everywhere 
instead of those expansive wires. 

Wireless communication, on the other hand, does not require a physi- 
cal channel and instead sends the signals via space. The medium utilized 
in wireless communication is known as an unguided medium since space 
only permits unguided signal transmission. A wireless network is an inter- 
connection of computing devices that are not connected by cables. The 
wireless network is the collection of several networks that enables physical 
connectivity between computers without the use of wired connections. For 
connection, generally devices used radio waves. The communication range 
can vary from a few meters to thousands of kilometres when using radio 
waves. Being removed from the barrier of a physical network, a wireless 
network has been used to connect multiple wired organizational struc- 
tures and to give connectivity within the organization, enabling employees 
to move around freely. These devices allow roaming within the network 
coverage and sharing of the information and resources. To provide the 
mobility feature, the network’s topology is constantly changing. Therefore, 
wireless networks are self-organizing and self-configuring. In addition to 
transferring data like files and emails, wireless networks are mostly used 
for audio and video conversions. Wireless environment is very beneficial 
for real-world applications such as healthcare, industrial, environment 
monitoring, smart cities, etc. [1]. 
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Figure 3.1 Wireless environment. 


Figure 3.1 shows an example of a wireless environment. There are differ- 
ent types of network technologies used according to their network range 
and performance. Users are allowed to use different network technologies 
and can switch between them according to their needs. Wireless networks 
allow users to share information from anywhere within the range of the 
network topology. A device can be located far away from a router and 
yet be connected to the network since access points boost Wi-Fi signals. 
Examples of wireless networks are mobile phone networks, satellite com- 
munication networks and wireless sensor networks, 5G Cellular, Wi-Fi, 
Bluetooth, GPS, etc. 

In wireless networks, security will be crucial. Security is a main issue, 
especially when data is being transmitted between devices and needs to 
be protected and secure. Even though 3G and 4G networks already have 
independent security layers, certain well-known types of attacks are still 
a possibility. Computer network security refers to the steps that organiza- 
tions take to monitor and stop unwanted access from outside intruders. 
Network security refers to safety across all networks including network of 
networks. Steps for computer network security depend on the size of the 
network employed. For example, a school requires basic security features 
but a military or banking system requires high security features. 

Wireless networks are used in many real-time applications in various 
fields, such as military and health; as a result, a wireless network requires 
security to control vital information like personal location, etc. Wireless 
networks use radio waves rather than wires to transfer the data between 
devices. Because wireless networks use a broadcast transmission medium, 
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they are vulnerable to security assaults. Wireless networks are suspicious 
of unauthorized interception, hacking and a variety of cyber threads since 
they lack physical barriers. Computer network requires security from 
attackers and hackers. The risk of data modification, removal and theft can 
be decreased by using a strong network security system. There are two fun- 
damental protections in network security. The first is data security against 
theft and illegal access. Second is computer security that prevents informa- 
tion from hackers. 


3.2 Literature Survey 


In most relevant surveys, various facts of wireless network security are 
investigated. In presenting previous studies on the security of wireless net- 
work, this section offers information on some of the most common secu- 
rity risks and challenges with their conclusions. 

There are four primary categories of wireless networks: PAN, LAN, 
MAN, WAN. According to Kanika Sharma et al. [2], Wireless personal 
area networks (WPAN) is the wireless network used to connect devices 
in a very limited area up to 10 meters. Its range is around a single person 
in a place. Bluetooth, infrared and zigbee is used for connectivity. Mobile 
phones, desktop, laptop, tablets, play stations, cordless mice, and head- 
phones are personal devices that are used to create the wireless personal 
area networks. WPAN are safe networks, restricted to limited coverage 
range [2]. Examples of wireless personal area network is body area net- 
work, smart home appliance, etc. Wireless local area networks (WLAN) 
are a group of computers and related peripheral that are interconnected 
in a constrained space, such as school, office, etc. WLAN is the wireless 
network used to connect devices for short-range communication up to 100 
meters. WLAN is also known as Wi-Fi. The coverage range of WLAN is 
in a limited area such as an office building, healthcare provider, school, 
hospital or university. It connects different devices such as printers, com- 
puters, mobile phones, etc. A small number of users can create a tempo- 
rary network without an access point. WLAN provides high-speed data 
transfer rates up to 200 Mbps for a short range. Wireless metropolitan area 
networks (WMAN) consist of several WLAN. While being smaller than 
WWAN, a WMAN is larger than a WLAN. The coverage range of WMAN 
is greater than WLAN and extends to an entire city or geographical up to 
50 km. IEEE802.16 standard is used to describe the WMAN. Wireless wide 
area networks (WWAN) consist of several WMAN. It has a large range, 
covering several neighbouring cities or states or a country. WAN is also 


SECURITY & RISKSIN WIRELESS ENVIRONMENT: HEALTHCARECASESTUDY 59 


referred to as cellular servicers. Through satellite links, a wireless wide area 
network expands over a huge geographical area and is not restricted to one 
site. WAN has multiple Personal networks, local area networks and met- 
ropolitan area networks to provide large-area coverage. GSM, GPR are the 
examples for WWAN. Other previous studies with their findings are given 
in Table 3.1. 


Table 3.1 Previous studies of security risks in wireless environment. 


Conclusion 


The authors used embedded Bluetooth applications for 
wireless networks that can benefit from the methodology 
or algorithm disclosed in this paper. 


Author 


Soo-Hwan Choi 
et al. [3] 


Kalpana 
Sharma 
et al. [4] 


Anitha S. Sastry 
et al. [5] 


Yulong Zou 
et al. [6] 


Yang Gao et al. 
[7] 


Hiren Kumar 
Deva Sarma 
et al. [8] 


Javier Lopez 
et al. [9] 


According to the authors, due to wireless transmission and 
resource limitation on wireless sensor network, security 
designs utilized for conventional wireless networks are 
not a practical solution to the security issues. Therefore, 
the nodes are frequently positioned in unsafe conditions 
in which they are not physically shielded, which makes 
wireless sensor networks much more vulnerable. 


In this paper, the authors provide an overview of the 
numerous threats and security issues in each layer of 
wireless networks. 


The authors explain the effective protective mechanisms for 
enhancing the security of wireless networks; the focus is 
on physical layer security, and security flaws and threats 
in a wireless environment are examined. 


In this paper, threats are categorized at physical, network 
and application layer using the architecture of cyber- 
physical systems. The authors also provide security 
breaches of cyber-physical system. 


In this paper the authors identify many security risks that 
could exist in a wireless network. Mathematical models 
of the threats have been attempted. 


The authors explain the summary and evaluation of a 
connection between the security threats, needs, and uses. 
They also explain the security requirements of current 
network standards. 


(Continued) 
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Table 3.1 Previous studies of security risks in wireless environment. (Continued) 


Aditya Patel According to the authors, wired and wireless networks are 
et al. [10] becoming more and more vulnerable to a new kind of 
security threats and flaws, rendering them unreliable and 
unsafe. They also provide a survey of various security 
threats and security measures used in an educational 
system. 


Rashid Nazir The security-related problems and difficulties are 
et al. [11] investigated in wireless networks. The authors also list 
the potential security risk and consider wireless network 
security measures. 


Al-Sakib Khan | Wireless network technology introduces several security 
Pathan et al. risks. In this paper, the authors explain the comprehensive 
[12] approach to security that A wireless network should take 

to provide layered and strong protection. 


3.3 Applications of Wireless Networks 


Wireless Networks and devices have a wide range of potential applications 
in human activities. Numerous industries use a wireless network, includ- 
ing those which track animals, monitor traffic, operate connected vehicles, 
and more. A wireless network is used for a variety of real-time applications 
as shown in Figure 3.2, such as home healthcare, environmental monitor- 
ing, military surveillance, etc., because of its mobility, flexibility, efficiency, 
easy installation and scalable nature [4]. Some application areas of wireless 
networks are given below: 


Internet access: The most important advantage of a wireless network is 
having the ability to share a single high-speed internet connection. Wi-Fi 
and Bluetooth all are because of wireless networks. 

Environment monitoring: One of the other main applications of wireless 
network is environment monitoring. By using environment monitoring 
we can observe and manage temperature, light, weather, etc. Environment 
monitoring is used in many different applications such as agriculture mon- 
itoring, forest monitoring, habitat monitoring, coal mining, earthquakes, 
rainfall range, water quality, greenhouse monitoring, climate monitoring, 
traffic, etc. By using the benefits of wireless network, a environment moni- 
toring system is able to monitor real-time applications. 
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Figure 3.2 Applications of wireless networks. 


Healthcare application: Many wireless technologies are applied to health 
care applications such as sensor networks, RFID, etc. Many healthcare cen- 
ters implant RFID tags to identify their patients. By using different wear- 
able and implant sensor equipments they can easily monitor and control 
patient health, such as blood pressure and heart rate. By using mobility a 
patient can be monitored anytime, anywhere and immediate treatment will 
be provided in case of emergency. 

Education: A wireless network is also useful in the field of education. As 
we know, during the COVID-19 pandemic all the classes are going on 
using video communication. We can attend any meeting or online class or 
seminar from anywhere. Online learning is already widely recognized as 
the best alternative, allowing for the distribution of knowledge over both 
time and location. 

Industrial applications: Wireless networks are also used in industrial 
applications for sensing and diagnostics, robotics and machinery health 
monitoring. It is used in a variety of industrial applications to address a 
wide range of connected issues. Wireless network application for logis- 
tics make use of GPS technology. This system uses an embedded terminal 
to find the items and a cloud service platform to identify the recipient to 
monitor the status of the goods in real time. The development of wireless 
network enables monitoring of electric machine status and energy usage. 
Smart homes: Wireless networks are used in indoor environments such 
as in smart homes, where machine-to-machine connections take place. 
A smart home can easily operate home appliances such as lights, CCTV 
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cameras, child monitoring system remotely from mobile phones by using 
wireless networks. This allows us to operate these devices anytime, any- 
where by using mobile phones. Motion monitoring indoors and indoor air 
quality monitoring are two common examples of smart homes. 
Connected vehicles: A vehicle that can connect to equipment nearby or 
far away using wireless network is referred to as a connected vehicle. For 
location tracking this is mostly used in vehicles. By using GPS, we can eas- 
ily find the location of the vehicle. 


3.4 Types of Attacks 


There are two types of attacks in a wireless environment: passive attacks 
and active attacks. These are explained below. 


3.4.1 Passive Attacks 


In passive attacks the attacker tries to learn something or obtain infor- 
mation. These attacks do not modify or remove the data. The hacker only 
captures the data during transition [13]. Figure 3.3 shows types of passive 
attacks, which are explained below. 


3.4.2 Release of Message Contents 


In this type of attack, the hacker obtains the data without the permission 
of sender and receiver of the communication system. For example, if the 
sender sends an email to the receiver and a hacker obtains the information 
from that email and sends it to someone else without permission of sender 
and receiver. 


| 


Passive attacks 


| 
ae Traffic Eavesdropping 
analysis 
contents 


Figure 3.3 Types of passive attacks. 
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Prevention: This type of attack can be handled by encryption so that the 
hacker cannot easily learn from the data. 


3.4.3 Traffic Analysis 


Suppose we decode the data by using encryption, the hacker may cap- 
ture the information but cannot obtain the data from the message. In this 
type of passive attack, the hacker discovers the pattern of data flow during 
communication. 

Prevention: use of strong encryption algorithms and masking. 


3.4.4 Eavesdropping 


Eavesdropping is also known as snooping. In this, someone listens to a 
secret conversation between sender and receiver. The result of eavesdrop- 
ping is that a hacker can intercept, remove, or modify the data between 
devices. An example of eavesdropping is to listen to a quarrel between your 
neighbours through a vent in your apartment. 

Prevention: VPN is used to prevent eavesdropping. 


3.5 Active Attacks 


In active attacks, the attacker not only steals the data but also modifies or 
deletes the message. Systems can be harmed by these attacks [11]. There are 
various types of active attacks, which are shown in Figure 3.4. 


Bandwidth stealing 


Figure 3.4 Types of active attacks. 
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3.5.1 Malware 


Malware occurs when an unauthorized programme or piece of software 
gains access to a target system and exhibits strange behaviour. The effects 
of malware are to remove important files, capture the information and not 
provide access to programs. 

Prevention: for the prevention of malware a proactive approach is used. 
Anti-malware programs and antivirus are used to detect the malware. 


3.5.2 Password Theft 


This is another security risk in a wireless environment. In password theft 
someone guesses or steals the password and the result will be the loss of 
information. 

Prevention: there are two ways to prevent password theft. One method 
is robust protection which requires an additional device to login, such as 
login is possible only with confirmation done by mobile. Another method 
is to use complicated logins to avoid brute force. 


3.5.3 Bandwidth Stealing 


A wireless environment is available for outer intruders also. They can 
lower the speed of the network by downloading games, music, etc., over 
the internet connections. 

Prevention: Limit the bandwidth according to number of employees to 
whom you want to provide access. 


3.5.4 Phishing Attacks 


Phishing attacks are very common nowadays. The hacker sends a link or 
attachment which requires sensitive data such as a password and compels 
the end user to click on that link. 

Prevention: Precaution is used to prevent phishing attacks. Official emails 
from organizations do not require your password so try to avoid filling in 
a password on the unknown links. 


3.5.5 DDoS 


DDoS is distributed denial of service attack in which a hacker sends many 
requests to target servers so that the server cannot handle these requests 
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and slows down or crypts the server. In this type of attack, the server is 
overloaded or slowed down by sending a number of false data requests to 
the server. This will cause problems for authorized users of the server to 
use the services of the server. 

Prevention: prevention of DDoS is by detecting suspicious traffic and plac- 
ing the server offline for maintenance. 


3.5.6 Cross-Site Attack 


In a cross-site attack the attacker loads dangerous codes into a website. The 
goal of this attack is to steal information or disturb standard services. 
Prevention: Stopping a cross-site attack is still a challenge. It depends on 
the website owner’s ability to find it and fix it. 


3.5.7 Ransomware 


As we know, data today is very important and all the data has been stored 
on a system so ransomware is the malware that installs itself on the system 
and after that it will not give access to either the whole system or a par- 
ticular part of it. To provide access to that data the hacker asks for some 
ransom amount. The hacker encrypts the computer system and demands 
an amount to decrypt them. The target of these attacks is the systems or 
organization for which paying ransom is easy in order to regain the data. 
An example is a banking system. The data in a banking system is very pre- 
cious so if the workers in a bank do not have access to the data of users it 
creates a serious problem. 

Prevention: To stop this malware after installation is very difficult; there 
is only precaution to avoid this. Best prevention of this attack is to have 
updated antivirus and avoid any suspicious link. Backups made from time 
to time and replications of data are used to prevent this. 


3.5.8 Message Modification 


In this type of active attack, the attacker modifies the message during com- 
munication. The attacker captures the message sent by the sender and then 
modifies it and sends the modified message to the receiver. 

Prevention: To stop this type of attack, use strong encryption algorithms. 
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3.5.9 Message Replay 


The attacker captures the message during transmission and then replays 
or retransmits this message to the receiver so that the receiver receives the 
same message multiple times. 

Prevention: Encryption is used to prevent this attack by encrypting the 
transmission between sender and receiver. By using scrambling, the server 
makes your data unreadable to an outsider. 


3.5.10 Masquerade 


In this type of active attack, the attacker pretends to be another authorized 
person. Using masquerade, the attacker wants to access the data of that 
authorized person. 

Prevention: By using authentication, we can stop this type of attack. 


3.6 Layered Attacks in WSN 


The open systems interconnection (OSI) model was developed by the 
International Organization for Standardization (ISO) in 1984. This model has 
seven layers and each layer has a specific task to perform. The computer systems 
employ these seven layers to interact over a network [5]. This section explains 
the attacks in different layers of OSI model which are shown in Table 3.2. 


Table 3.2 Security attacks in OSI protocol layers. 


Application | Malware attack, SQL injection, cross-site scripting, FTP 
layer bounce, SMTP attack, attacks on reliability. 

Transport TCP flooding, UDP flooding, desynchronization, TCP sequence 
layer andprediction attack, data integrity, energy drain attack. 

Network Neglect and greed, homing, misdirection, hello flood attack, 
layer black holes, spoofing, sink holes, IP hijacking, Sybil attacks, 


node replications, worm holes, flooding, attack against 
privacy, internet smurf attack [12]. 


Data link Collision, jamming, exhaustion, interrogation attack, Sybil 
layer attack, data aggregation, voting , MAC spoofing, identity 
theft, MITMattack, MAC flooding, unfairness [12]. 


Physical Eavesdropping, jamming, tampering, side channel attack, Sybil 
layer attack, random interference and timing attack [8]. 
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3.6.1 Attacks in Physical Layer 


Eavesdropping: An attack in which a hacker listens to personal data with- 
out knowledge or permission of sender and receiver of the system. 
Jamming: Jamming radio signals transmission causes a problem with 
WSN’s radio frequencies. Therefore, the transmission of data is stopped. 
Tampering: The attacker intentionally modifies the data in a way that 
would be harmful for the users. 

Side channel attack: Depending on the physical properties of a cryptosys- 
tem, the attacker finds the secret information. 

Sybil attack: The attacker creates multiple identities to slow down the net- 
work speed. 

Random inference: The hacker randomly interrupts the user of the com- 
munication system. 

Timing attack: This attack involves a calculation of time to perform 
encryption or decryption to obtain a key. 


3.6.2 Attacks in Data Link Layer 


Collision: This occurs ifa channel is occupied by another sensor node, and 
therefore a lot of data is lost due to collision. 

Jamming: This happens when a radio frequency from the other broadcasts 
interferes with the data transmission. 

MAC spoofing: It is also referred to as counterfeiting of MAC address. In 
order to access wireless networks, MAC spoofing is frequently utilized. The 
attacker attacks a network to obtain valid MAC address and modify the 
media access control address. 

Identity theft: When someone steals your personal information, identity 
theft occurs. It can be done in multiple ways. In data link layer, the attacker 
steals the MAC address of user. 

Man in the middle (MITM) attack: This occurs when an attacker inter- 
feres with the user’s interaction with an application. 

MAC flooding: It is a technique to determine the security of network 
switches. 
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3.6.3 Attacks in Network Layer 


Hello flood attack: The attacker sends hello packets to sensor node to pre- 
tend that this is neighbour node and try to get the data packets. 

Black holes: The attacker pretends to have the shortest path from node to 
base station and if the sensor node chooses that path, then malicious node 
hack the data. 

IP Spoofing: It is falsification of IP address. 

Sinkholes: The attacker attempts to create a lot of traffic in base station to 
interrupt the sensing data coming from the nodes. 

IP hijacking: In this type of attack, the attacker hijacks or steals the IP 
addresses. 

Wormbholes: In a wormholes attack, a malicious party replays messages 
that have been received in one area of the network through a low latency 
channel. 

Flooding: In flooding, an attacker sends multiple data packets to slow 
down the network. 

Attack against privacy: The attacker steals the personal information of the 
users through the network. 

Internet smurf attack: The attacker sends multiple ICMP requests to halt 
the network. 


3.6.4 Attacks in Transport Layer 


TCP flooding: This is referred to as part of DDoS, also known as SYN 
flood; it takes advantage of a portion of the typical TCP three-way hand- 
shake to deplete the resources of the server and make it unavailable. 

UDP flooding: This is also referred to as part of DDoS; the attacker floods 
the targeted host's random ports with IP packets including UDP datagram. 
Desynchronization: This is also referred to as TCP hijacking. It’s a proce- 
dure where the expected sequential number and the sequential number in 
a received packet are different. 

TCP sequence and prediction attack: An attacker predicts TCP sequence 
number for the creation of a legal user data package. 

Data integrity: Data integrity attacks alter or introduce fake data into 
packets, which determine the data being transmitted between WSN nodes. 
Energy drain attack: As we know, in a wireless network, sensor nodes have 
limited battery power. So the attacker sends a false alarm which drains the 
battery power of sensor nodes. 
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3.6.5 Attacks in Application Layer 


Malware attack: Malicious software is produced by an attacker in the form 
of programming, scripting and data. 

SQL injection: In this attack, an attacker uses a rogue SQL statement to get 
unwanted access to trustworthy websites. 

Cross-site scripting: The attacker tries to get a number of access control 
measures by adding client side scripts to websites. 

FTP bounce: An attacker sends unbound traffic to another server of the 
network to get unauthorized access. 

SMTP attack: Threats to the transmission of SMTP server and client 
emails. 

Attacks on reliability: The attacker attempts to obtain the communication 
path by sending a false query. A node will experience energy drain when it 
responds to this false query. 


3.7 Security Models 


There are many methods to enhance the security in a wireless environment 
such as trust and reputation security models, secure routing protocols and 
intrusion detection systems. Trust and reputation security models improve 
the security in wireless environments as explained below [14]. 


3.7.1 Bio-Inspired Trust and Reputation Model 


The most reliable node along the most reliable path providing a specific ser- 
vice is chosen by BTRM-WSN [14]. It is based on the Ant Colony Systems 
(ACS), a bio-inspired algorithm based on ants’ construct routes to graph- 
ically satisfy certain constraints. The ants leave behind some pheromone 
traces that aid other ants in locating and travelling along the same paths. 
Ants will use these pheromone values to determine the best routes because 
the best path will have the highest concentration of pheromone value. We 
utilize “pheromone value” to represent the credibility of sensors when we 
apply our ACS algorithm to a trust and reputation system. Each sensor 
carries pheromone traces for its neighbours, determining the likelihood 
that an ant would choose a path. Artificial ants are constructed, and they 
eventually depart from the client sensor. When an ant moves from one sen- 
sor to sensor, it sends an instruction to these sensors via equation 3.1 and 
equation 3.2 to change the pheromone value of the route between them. 
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tij = (1- 9). tij + gO. (3:1) 


O. = (14 (1- @). (1- tij).nij (3.2) 


There are various scenarios that could happen when an ant k gets to 
sensor s. The first scenario is when sensor s provides the service then the 
average phenomenon value of the route taken by ant k from the client until 
the sensor s is calculated 1[0,1]. If the sensor s has more neighbours that 
have not been visited by ant k, ant k pauses and returns if exceeds the pre- 
determined transition threshold (TraTh) and vice versa. Ant k stops and 
returns the solution if sensor s has no more neighbours or if every one of 
them has already been visited. Another scenario is when a sensor s does 
not offer any services. If sensor s still has neighbours that ant k has not 
visited yet, k chooses which node to travel next. Ant k runs into trouble if 
sensor s has no more neighbours or if each one has already been explored. 
It must retrace its steps until it reaches either: 


a) Asensor that provides the required service. 
b) A sensor that does not, but has other nearby sensors that 
have not been visited yet. 


The client will look over and evaluate the calibre of each launched ant’s 
response. Equation 3.3 is used to calculate the route quality. 


Q(Sk) = (tk/length (Sx)""").%A, (3.3) 


Where, @ is the parameter controlling how much pheromone the ants 
leave behind, tij is the pheromone value of the route between sensors i and j, 
Convergence value of tij is ©, 

S, Is the solution given back by ant k. Q(S,) is the quality of the selected 
route. 

tk is the average path pheromone of route S,, Percentage of ants used for 
the solution as ant k is %A,, And path length factor is PLF. 


3.7.2 Peer Trust System 


The basic goal of dynamic peer-to-peer trust and reputation model, 
known as the peer trust model, is to estimate and assess a peer’s reli- 
ability or quality, in an online commercial context [14]. For calculating 
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the reliability value of a given peer, it identifies five trust and reputation 
management related factors: 1) the responses a peer receives from others; 
2) the response scope or field; 3) the source creditability; 4) the transac- 
tion scenario variable designed to address the essentialness of transac- 
tion; 5) the community scenario variable trying to interpreting related 
features. In wireless networks equation 3.4 could be used to determine 
the trust value of peer u. 


T(u) =a- > S(u, i) - CR(p(u, i)) - TF(u,i) + B-CF(u) (3.4) 


Where, T (u) is the value of reliability of peer u. 

a is the weight factor used for evaluation 

B is the weight factor used for community scenario variable 

And § (u,i) is the normalized amount of satisfaction which peer u 
received in i transaction. 

Therefore, these are two security models by which we can enhance the 
security in wireless networks. 


3.8 Case Study: Healthcare 


Wireless communication is beneficial for real-time applications such 
as in entertainment, transport, shopping, industry, medical and many 
other areas. Wireless network topology has the potential to revolutionize 
the way we live. The most important application of wireless network is 
healthcare. It is referred to as wireless medical sensor network (WMSN). 
The main concern of WMSN is patient mobility and reliable commu- 
nication. The creation of wireless healthcare application presents many 
issues including timely distribution of data, quick event detection, power 
management, etc. 

Figure 3.5 shows the healthcare system. In this figure, multiple sensors 
are attached to the human body. These sensors can be a wearable device 
such as a smart watch or implant in the human body. Sensors are con- 
nected to a cell phone or gateway by using a wireless network and these are 
connected to the internet and send information to different recipients, for 
example to call an ambulance or inform the hospital and family members 
or take a prescription to a specialist doctor and immediate treatment will 
be provided in case of emergency. Later, by using the server this informa- 
tion can be saved to the database. 
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Figure 3.5 Application of wireless network towards healthcare. 


3.8.1 Security Risks in Healthcare 


Hospitals and other healthcare facilities face many active attack and pas- 
sive attacks [15]. As we know, in a passive attack the attacker steals the 
information but does not modify or delete it. Therefore, in healthcare 
many attackers capture the confidential report but do not modify it. These 
days, one of the most popular active attacks is ransomware, which affect 
confidentiality. According to the survey, the rate of healthcare ransomware 
attacks is rising, making the industry more vulnerable to a wide range of 
threats [16]. In this attack the hacker locks the whole system or part of 
the system and asks for a ransom amount to reopen the patient-related 
sensitive and confidential data. According to a report [17] “The number 
of ransomware attacks on US healthcare organization increased 94% from 
2021-2022, according to one report’. Healthcare centres are a regular tar- 
get for ransomware because they rely so much on access to data, such as 
patient information, to keep their operations running smoothly. 

Figure 3.6 shows the ransomware attack cycle. As the figure explains, 
the first attacker or hacker send a malicious code or link through a phish- 
ing e-mail to the target system. After clicking on that link or malicious 
code the execution of malicious code and the searching of important file 
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Figure 3.6 Ransomware attack in healthcare. 


extensions are started. After this process, the attacker can command and 
control the server and encrypt the computer system resources. Encryption 
can be applied to the whole system or in part of the system. Therefore, 
an attacker demands a ransom amount; most commonly it is bitcoin in 
the form of ransom. When the amount is sent to the attacker, the attacker 
decrypt or unlock the system. 


3.8.2 Prevention from Security Attacks in Healthcare 


The prevention of passive attacks is one use of a firewall. Try to have 
updated antivirus. For ransomware, there is only one precaution to take. 
Avoid clicking on a suspicious link because the hacker will send a mali- 
cious link via email or through advertisement to hack the system. Create 
timely backups and make redundancies of data to prevent from this 
attack. The majority of healthcare centres opt to purchase cyber insur- 
ance to minimize the financial risk involved with such an assault. Some 
steps can be taken if you are facing a ransomware attack, as shown in 
Figure 3.7. 
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Figure 3.7 Steps to do after ransomware attack in wireless networks. 


Step 1: Isolate the computers so that this malware does not infect the other 
systems. 

Step 2: Try to identify the malware or malicious link. 

Step 3: Once you identify that malware, remove it. 

Step 4: Notify the employees. 

Step 5: Contact your insurance agent regarding this. 


3.9 Minimize the Risks in a Wireless Environment 


Figure 3.8 shows steps to avoid risks in a wireless environment. You can 
follow these easy procedures to secure your wireless network [18]. These 
steps are given below: 


3.9.1 Generate Strong Passwords 


Generate strong passwords with the combination of minimum eight char- 
acter and having one large alphabet, numbers, and special characters. 
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Figure 3.8 Steps to avoid risks in wireless environment. 


3.9.2 Change Default Wi-Fi Username and Password 


Change default Wi-Fi username and password by giving your SSID a new 
name, altering its default settings, and turning off its broadcast to users. 
Avoid connecting unknown open public Wi-Fi. 


3.9.3 Use Updated Antivirus 


Try to use updated antivirus software. Place a host-based firewall, make 
sure your security policy is robust and apply the policy of the rule-based 
firewall configuration. 


3.9.4 Send Confidential Files with Passwords 


While transferring the data or files use a password to open them. For exam- 
ple, for banking KYC e-mail they provide a pdf having a password that is 
the DOB of that person so that only authorized person can see the data. 


3.9.5 Detect the Intruders 


Install an intrusion detection system based on the network, and analyze 
the log weekly and use updated antivirus software. Updates for anti-virus 
software are distributed from servers to clients. Make frequent data back- 
ups and restore data as needed. 
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3.9.6 Encrypt Network 


Encryption is used to protect the data in the wireless network. Encrypt 
all WLAN traffic at all times. Application encryption programmers like 
pretty good privacy; secure socket layer should be used so that the attacker 
cannot easily hack the data. Current standards for encryption WAP3 to 
encrypt your data are advisable to use. Encryption should be applied to 
protect wireless network traffic to avoid traffic analysis attack. If the wire- 
less network is not being used for a long time, turn it off so it will prevent 
hackers. If the network is not secure do not open any important data or 
enter credit card information. 


3.9.7 Avoid Sharing Files Through Public Wi-Fi 


Avoid sharing important files by using public Wi-Fi. Mostly public Wi-Fi 
are not safe to share private information. Restrict the access of wireless 
network to only authorized persons. 


3.9.8 Provide Access to Authorized Users 


Only authorized users should be allowed to access. Create a guest account 
with visitors’ permissions on a different wireless channel; for instance, a 
guest is required to access the network in order to protect the confidential- 
ity of primary credentials. This concept would be beneficial to ensure that 
employee and visitor traffic is routed through different network channels. 


3.9.9 Used a Wireless Controller 


A wireless controller is a gadget that coordinates the provisioning, operat- 
ing and management of access points. As the access points register to this 
controller, it will be possible to configure and operate the entire wireless 
network from the controller’s interface as a single entity. 


3.10 Conclusion 


In this chapter we present a study of security risks in wireless networks 
defensive techniques to defend the availability, confidentiality, and integ- 
rity of the network against malicious intrusions. Commercial applica- 
tions, as well as the public and private sectors, are gradually using wireless 
networks. Sensor networks are in high demand for real-world settings; 
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therefore, it is necessary to provide efficient and usable security mecha- 
nisms that could protect the network against attacks. Most security issues 
in a wireless network are due to insertion of fake information by malicious 
nodes. The network becomes vulnerable because access points or wireless 
devices are installed in a public environment. It is very easy to interfere with 
a wireless network and challenging to prevent it. This chapter describes the 
security attacks in protocol stack from physical layer to application layer 
and security models to minimize the risks in wireless networks. Moreover, 
this chapter provides insights about a healthcare case study along with its 
risks and preventions. 
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Abstract 

Traditional data communication networking is not suitable for industry 4.0. So 
industry moves to implement software-defined networks for managing the net- 
works. But security is an important thread in a software-defined network. Most 
attackers are easily getting access to the resource in the industry. The Denial of 
Service (DoS) and Distributed Denial of Service (DDoS) attacks cause heavy dam- 
age in the production area. Malicious attacks will create a services gap; network 
services throughput enter into a down state, and there is a loss in business conti- 
nuity. Traditional Intrusion Detection System (IDS) will detect malicious traffic 
based on a predefined access control list but it cannot detect new malicious traffic 
ingress into home networks. Machine learning techniques will lead to better iden- 
tification of threats to synthetic or real-time data. To avoid these situations, we are 
proposing a model to find the attackers in the network and train the model to find 
the new type of intruder in the system. 
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4.1 Introduction 


The fundamentals of networks have not changed much during the past 
ten years, despite substantial advancements in computer sciences. The 
ever-evolving needs of the informatics industry and the expanding and 
emerging data centers have demonstrated that the conventional approach 
to network administration is no longer adequate. The rapid development 
of technologies uses the Internet of Things to access everything in the 
computation field [18]. Most people use internet for developing their busi- 
ness, so network administration needs to be more active and amenable to 
mixing with novel structures. Applications using distributed architectural 
structures and the variety of users using new generation devices opened 
the way for the requirement of readily maintaining networks without the 
need for people support. A method of network management which doesn't 
take into account these factors could lead to confusion about the already 
constantly evolving and constructing services. 


4.1.1 Software-Defined Network 


SDN has adopted a new architecture and conventional networks’ position 
because of its capacity for responding quickly and easily to new events. In 
a software-defined network there is no need to monitor each and every 
device in the network. Since network control is now directly programma- 
ble, network infrastructure components like switches and routers are now 
separated from network services. Network control and forwarding opera- 
tions are separated in SDN architecture [12]. 

In the software-defined network the control plane and information 
plane are separated for the efficient usage of the components. The orga- 
nizer plays a vital role in managing and guiding each and every switch in 
the network; in a software-defined network the controller plane will man- 
age and monitor all the nodes in the network. On the basis of the con- 
figurations provided by the organizer plane, the information plane is the 
network architectural layer that physically manages the traffic. The data 
transmission layer is nothing more than a group of switches that are inter- 
connected. These switches are in charge of acting on the received packet 
in accordance with the flow rules listed in the flow table. The switch keeps 
each new packet that enters it in a buffer. The availability of the buffered 
packet in the flow table is then verified. If there isn’t a current rule for that 
particular packet in the flow table, the packet-IN message is forwarded to 
the organizer to create one. The modification’s flow table is then updated 
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with this new flow rule. Because the switch’s buffer and flow table have 
a limited amount of memory, they are vulnerable to DoS attacks. Given 
the vulnerability of the shift’s limited Drift Table warehouse size, a hacker 
transport spoof packets with internet protocol bluffing will have to add 
a novel regulation to be injected in the switch. As a result, a hacker can 
quickly produce a large number of packets and send them to unnamed 
network hosts, saturating the buffer and rapidly substantial the Movement 
Board. Unfortunately, this will result in usual, honest circulation not being 
routed through the switch [1]. 


4.1.2 Types of Attacks 
Denial of Services (DoS) 


e Volume-based assaults 

e Protocol assaults 

e Application-layer assaults 

e UDP Flood 

e Internet Control Message Protocol Flood 
e Chink of death 

e Slowloris 

e Enlargement NTP 

e HTTP flood 


Distributed Denial of Services (DDoS) 


e Application layer assaults 
e Protocol assaults 


4.1.2.1 Denial of Services 


The DoS (Denial of Service) attack is the most prevalent type of security 
risk for networked systems. By monopolizing network resources, such 
as a server, it seeks to prevent intended users from using them. Sending 
IP packets to a victim in order to produce saturation or instability is the 
basic idea behind a DoS attack. Due to their evolution into increasingly 
sophisticated and varied DoS attacks, it is becoming more difficult to 
detect them. Furthermore, even without any technical expertise, anyone 
with attack tools may carry out DoS attacks with ease. Consequently, it still 
poses a serious hazard. However, because the headers of attack packets are 
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Figure 4.1 Operation of denial of service. 


fabricated to seem like normal ones, it is difficult to distinguish between 
normal and attack traffic [14]. 

In the denial of service an attacker floods the packets in a network in 
order to create trouble for the software network as shown in Figure 4.1. 
Flooded packets create overloaded services to the network and reduce its 
service. Another important impact of the DoS attack is to deny the needed 
website to the user at a time of emergency. Packet injection is another type 
of attack which reduces the performance of the network. The flooded pack- 
ets will create anonymous traffic delay in the network which creates an 
artificial delay to the user. The user believes that the delay in the traffic 
process is because so many communications happen in the network [2]. 


Volume-based assaults 

The key goal of the attacker is to reduce the bandwidth of the site in min- 
utes per second. This form of attacks contains spoof-packet floods, ICMP 
floods and UDP floods. 


Protocol Assaults 

The main purpose of this attack is to utilize the honest server resources 
and its components used for communication, load balancing and firewalls 
implementation. The broadcast rate is calculated by packers per second. 
The different category of attacks like Chink of Death, Smurf denial of ser- 
vice, SYN floods, and fragmented packet attack fall in this type. 


Application-Layer Assaults 

This type of assault, which is measured in requests per secaim, aims to take 
down the web server. It targets certain platforms like Apache, OpenBSD, 
and Windows. Two examples of the attacks are 
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e GROW/STAKE floods 
e Low-and-Slow attacks. 


UDP Flood 

The full attention of the UDP assaults is to overflow accidental ports on 
the remote host; it is subject to UDP flood attacks. The host keeps looking 
for the application ports, and if none are found, it responds with an ICMP 
message indicating that the destination is inaccessible. This has an impact 
on the host resources and makes services unavailable. User Datagram 
Protocol (UDP) packets are used to affect and attack the host, as the name 
suggests. 


Internet Control Message Protocol Attacks 

In light of the fact that all vulnerable servers try to respond with ICMP 
repeat response packets constantly, which causes the system to crash or 
slow down, ICMP assaults use both arriving and departing bandwidth. It is 
comparable to UDP assaults but instead of waiting for a response, it sends 
ICMP echo request packets at a high transmission rate to the target. The 
requestor transmits numerous SYN requests during a SYN flood attack, 
but never responds to the host’s SYN-ACK response, or it transmits the 
SYN request from a fake or hidden IP address. It is now necessary for the 
host server to wait for each request to be acknowledged by the receiver 
and for the permanent required of properties up until the founding of new 
connections, which finally leads to DoS. This one occurs in order to exploit 
a known weakness in the TCP joining sequence. It looks like a three-hand 
salute. Any SYN request that must be sent across a TCP connection to one 
or more host servers must first be acknowledged with SYN-ACK answers 
before being verified by the requestor with ACK messages [25, 26]. These 
assaults therefore affect the service requestor’s response. 


Chink of Death 

This kind of violence entails transfer the server a stable stream of malicious 
or malfunctioning rings. The IP packet's extreme length, with the header, 
is 65,535 bytes. The maximum frame size allowed by the data connection 
layer over Ethernet is 1,500 bytes. A supreme Internet protocol packet is 
distributed into several Internet protocol rubbles in this circumstance, and 
the receiving host has the necessary IP packets or fragments to complete 
the IP [27, 28]. The receiver packets produced when the malware reassem- 
bles the fragmented data are larger than 65,535 bytes. Even genuine and 
authentic packets may experience denial of service if the memory space 
allotted for the packet is exceeded [15]. 
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Slowloris 

This kind of attack has a significant effect, such as permitting one network 
server while shutting down the other server without affecting other host 
network ports or facilities [13]. It accomplishes this by beginning connec- 
tions to the host server but only transmitting incomplete demands, holding 
numerous networks to the host web server for as extended as promising. 
It consistently sends more HTTP headers, but it never fulfils the demand. 
The cloud method keeps the port or facilities open for this fictitious asso- 
ciation, which reduces the available capacity for valid requests. As implied 
by the name, this slows down the entire system by exceeding the allowable 
number of concurrent connections [19]. 


Enlargement of NTP 

In this kind of assault, the attacker targets users of System Period Rules in 
order to overwhelm a host server with UDP flood. The term “amplification 
stabbing” refers to situations where the proportion of a demand to a reac- 
tion is substantially higher than 1:100 or 1:220. It means that the hacker 
has access to a list of available NTP servers and can launch DoS assaults 
with the highest possible volume and distressing maximum bandwidth. 
Only NTP protocols are the focus of this kind of assault [17]. 


HTTP Flood 

In this instance, the hacker targets the standard and lawful HTTP GROW 
or STAKE reaction to take advantage of an online request or web server 
[21]. It does not employ reflection, tricking, or broken packet techniques. 
Compared to other types of assaults, it uses the least amount of bandwidth 
to slow down an application or host server. When it forces the organization 
or request to allocate the most assets in reaction to each unit request, it is 
more effective. 


4.1.2.2. Distributed Denial of Service 


A distributed denial-of-service (DDoS) assault takes place when several 
systems attack a server with fake traffic, as shown in Figure 4.2. Finally, the 
server becomes overloaded and either crashes or stops responding to even 
valid requests. From 2020 to 2021, DDoS attacks increased by 341%. This 
was primarily because the COVID-19 pandemic forced numerous com- 
panies to convert to digital operations, which inevitably increased their 
vulnerability to hackers. A DDoS attack is one of the most feared cyber- 
attacks. A well-planned DDoS attack may be very difficult to avoid and 
equally difficult to stop. Even the most cutting-edge IT companies’ servers 
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Figure 4.2 Working model - distributed denial of service. 


are susceptible to them; the attacks may start at any time and the server will 
become unusable. In 2018, GitHub was hit by the then-largest-ever DDoS 
attack, which drowned their servers with over 120 million records packets 
each single subsequent time. The basic idea is the same regardless of the 
size of the onslaught: overwhelm a server with requests that it cannot pro- 
cess. Do this repeatedly till it clatters or stops reporting. Repairing service 
interruptions can frequently take hours and result in significant financial 
losses [20]. 


Application Layer Assaults 

The server creates the reply to a received customer appeal at the application 
layer. For instance, when a person types http://www.nkr.com/resources/ 
into their browser, an HTTP demand for the resource page is transmitted 
to the server. The server collects the page-related data, compiles it into a 
reply, and directs it to the browsing software once more. The application 
layer is where the information is fetched and packaged. When a hacker 
uses various bots or technologies to bombard the server with requests for 
the same resource repeatedly, this is known as an application layer attack. 
One example of this is repeatedly inquiring a server to produce PDF book- 
lets. The server cannot identify an attack because the IP address and other 
identifiers vary with each request [22]. 


Protocol Attacks 

All of the volumes of web servers and other assets, such as firewalls, are 
used by procedure assaults. They mark the board unattainable by baring 
hovels in network layer and transport layer. An instance of a protocol 
attack is a synchronization flood, in which the invader blasts the target 
with a large number of handclasp needs for the TCP with copied source 
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IP addresses. The goal is to overwhelm, since the battered servers make an 
effort to answer all joining appeal, but the last handclasp never takes place 
[23]. 


4.2 Related Works 


The main objective of the work is fully focused on attacks on software- 
defined network. A method, SDN Secure Control and Records Plane 
(SECOR), is a potent procedure that employs novel activates to identify 
and prevent Denial of Service assaults in both the control and statistics 
planes. Additionally, in order to replicate and study the consequences of 
denial of service assaults on the hardware switch and controller for SDN, 
The SDN network testbed was established. Additionally, they confirmed 
and tested SECOD’s capability to recognize and prevent denial of service 
assaults on SDN. The examination findings demonstrate that denial of ser- 
vice attacks may be recognized and countered, strengthening SDN’s secu- 
rity characteristics. They found that a dynamic threshold would increase 
both the controller and switch’s resource efficiency and flexibility [3]. 

SDN-based application that integrates with the OpenDayLight con- 
troller to record and analyse traffic heading toward the victim in a time 
period of between 100 and 150 sec, the work identifies and mitigates DDoS 
assaults and restricts them at their source [24]. Using SDN standards, areas 
of strength, limits, and the fact that the SDN specification specifies packet 
forwarding to the controller, the work focuses on a solution that works 
exceptionally well for SDN [4]. 

There is a strategy to counter SIP DoS attacks. Conserving bandwidth 
and making full use of the restricted system cache considerably increases 
performance while fighting against SIP DoS assaults when compared to the 
conventional protection strategy. Additionally, they tested this plan and 
confirmed its effectiveness. The performance has greatly improved with 
this new plan. However, the simulator shows that the size of the low prior- 
ity queue is still shrinking gradually [5]. 

HDB is a technique based on a sender’s past event. The HDB system 
estimates the proportion of stream of traffic that should be reflected to 
intrusion and detection system. The controller consults the HDB to get the 
sender's incident details. Traffic flow copied to intrusion detection system 
is sharp by way of provided minimal traffic if the dispatcher consumes no 
incidents logged in HDB. Traffic flow imitated to intrusion and detection 
system is well-defined as the provided maximum stream of traffic if there 
is an incident and the incident is equal to or greater than the specified 
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threshold. Otherwise, traffic ratio is used to determine how much traffic is 
mirrored to IDS. In comparison to traditional schemes, the HDB strategy 
decreases stream of traffic in the connection to intrusion detection system 
by a regular of 54.1% after the trial case [6]. 

Also to be considered is the effect of the denial of service assault on the 
bandwidth between two multitudes in the software-defined network and 
how it affects the OpenDaylight and Python-based software defined net- 
work controller. After initiating a denial of service assault, the results on 
OpenDaylight and Python-based software defined network controller dis- 
played a low bandwidth result. A DoS attack still had a detrimental impact 
on a host’s bandwidth even after a valid user connected to a server. This 
result is brought about by the switch’s memory limitations, which prevent 
it from adding a movement slab for a valid handler once a flow timeout 
has occurred. Another factor contributing to the unfavorable outcome is 
the controller congestion experienced while handling packet-In events and 
attempting to connect flow tables whose buffer has been removed from a 
switch. In order to stop any traffic from isolating the SDN architecture, it 
may be possible to apply a packet rate restriction. But this needs to be done 
cautiously, especially if there are several hosts on a network that are simul- 
taneously accessing the same server [7]. 

Regarding the viability of the Distributed Self-Organizing Map (DSOM) 
for DoS attack detection, in the suggested method, several DSOMs are 
active and individually detecting DoS assaults at various points. To 
eliminate the map divergence, they are combined into a single SOM in 
a weighted sum method [16]. The trials using actual data demonstrated 
that DSOM is capable of matching the original SOM approach’s detection 
performance. They modify the DSOM to fit a Software-Defined Network 
(SDN) environment in future development [8]. 

The strategy uses two new information measures, the RE metric and 
the ID metric, to classify low-rate distributed denial of service assaults. 
The software-defined network information plane is dangerously threat- 
ened by the tiny volume of malicious traffic. Although it is quite difficult to 
identify this kind of assault, it is crucial to study it as soon as it manifests. 
The amount of control events has the most influence on the SDN control- 
ler layer [9]. The controller layer experiences congestion as the quantity 
of events rises, which results in a reduction in server resources. The only 
Shannon entropy approach available in this case is insufficient to detect the 
false alert. As a result, RE can be used as information distance metric for 
low-rate DDoS attacks [10]. 

To identify and counteract DDoS attacks, they created the lightweight 
DOCUS model. As an additional unit for the Python-based checker, they 
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built DOCUS. The three components that make up DOCUS are changed 
switch performance, finding, and moderation [20]. Using the amount of 
synchronization and final acknowledgement packets at the organizer of the 
network, they modified the switch behavior; the changed switch behavior 
accurately counts the amount of half-open and whole networks. Because 
the detection module employs the stateless CUSUM method, DOCUS 
is both generic and stateless. They altered the CUSUM algorithm in the 
detection module to identify DDoS assaults and distinguish flash traf- 
fic from attack traffic [11]. Table 4.1 contains malicious attack detection 
methods and their limitations. 


Table 4.1 Relative study of connected works. 


Finding method 


12 SDN Secure 
Control and 
Data Plane 
(SECOD) 


SIP DoS attack 
strategy 


Hdb technique 


OpenDayLight 
controller 
(ODLC) 


13 
14 
5) 


MAC & Desc 
IP 

IP & MAC 
Web Url 


In time and 
out time of 
the packet 


Traffic ratio, 
bandwidth 


Dynamic 
threshold 
increase for 
both the 
controller 
and switch’s 
resource 
efficiency and 
flexibility 

Minimum time 
required to 
analysis the 
hacker in the 
system 


low priority 
queue is still 
shrinking 
gradually 


Hdb technique 
based ona 
sender's past 
event. 


Only used 
for single 
controllers 


Packet 
forwarding 
to specific 
machine 
leads to 
many type 
of attacks for 
the system 


No of data 
packets 
increased its 
automatically 
slow down 
the process 

Mirroring is 
difficult if the 
traffic is high 


(Continued) 
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Table 4.1 Relative study of connected works. (Continued) 


23 Effects of the DoS | Bandwidth Bandwidth If the 
attack on the among two bandwidth 
bandwidth hosts in the is low its 
software automatically 
defined reduced the 
network efficiency 
24 Distributed Self- Time, Automatic Takes time to 
Organizing Bandwidth detection of identify the 
Map (DSOM) new types of attack in the 
attacks initial stage 


RE metric and MAC address | Easy It’s difficult 
the ID metre, identification to find the 
to identify DDoS attack attack during 
low-rate DDoS the heavy 
attacks traffic 


Lightweight MAC Lessmemory | Difficult for 
DOCUS model Address is needed to huge traffic 
implement 
the model 


4.3 Proposed Work for Threat Detection and Security 
Analysis 


The proposed work is used to detect malicious DDoS and DoS attack 
using following modules, traffic collection, feature selection using entropy, 
malicious traffic detection and traffic mitigation, as shown in Figure 4.3. 
Malicious nodes are detected by two stages using a machine learning 
approach. Under stage 1 traffic is grouped using K-Medoids and under 
stage 2 traffic is filtered using multinomial regressions. 


4.3.1 Traffic Collection 
4.3.1.1 Data Flow Monitoring and Data Collection 


With the increase in usage of the internet, the flow of data across the 
globe is substantially increased. The development in internet technolo- 
gies, cloud services, networking capacities, hardware and software, data 
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Framing of the 
Expected Traffic 
Status 


Traffic Filtering 
using Regression 


Figure 4.3 Proposed work. 


flow monitoring, and data collection plays a vital role for network peo- 
ple, and the number of security analysts has grown rapidly in recent years. 
Cyberattacks are growing more complicated, clever, and targeted. Despite 
these complications, the essential function of data flow monitoring, data 
collecting, data processing, data analysis in quick time, and monitoring 
the security of the system remains unaltered. This plays a critical role in 
detecting and responding to network intrusion. Many leading companies 
have instead initiated the use of new categories or sorts of network data 
that can be collected. Security experts have been enabled by the above-said 
features to obtain a better understanding of their network's activities, and 
assess its security in a better way. 


4.3.1.2. Purpose of Data Flow Monitoring and Data Collection 


It is the role of network people or security experts to study the data which 
contains critical information. The majority of data flowing in the network 
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is free from critical information. Only a very small percentage contain crit- 
ical information. So the expert’s role is to find the bits correctly, gather 
all affected bits, analyze them and safeguard them from the normal user. 
Proactively, security experts may use real-time network data monitoring, 
testing, and analysis to assist in uncovering network vulnerabilities, assess- 
ing performance, evaluating service levels, and even detecting suspicious 
activities. Regardless of the organized form of individual network data, 
different security risks, assaults, and intrusions can create variances in net- 
work traffic type, volume, source, and destination. 


4.3.1.3 Types of Collection 


Everyone who uses the internet is producing data. The nature of data pro- 
duced by individuals, companies, and hosting agencies varies depending 
on the purpose. The purpose may be training data, data used for testing, 
detection of vulnerabilities, or any incidental purpose. Data collection and 
analysis are based on various factors such as the quality of collections, tools 
available for collection, and various capacities to efficiently and effectively 
analyze the data after collection. Because of these, network data collec- 
tion and analysis are becoming a complicated task. For all of these reasons, 
companies and security professionals must understand the many sorts 
of data that may be collected and, ultimately, what the data can tell them 
about what is happening or has happened within the network. 

Packets are very important for traffic analysis in the network. A packet 
carries a lot of important data. It is used in all devices from basic mobile 
phones to large-scale industries. Because of its usage, it is the most com- 
mon way of data collection in a particular place at a particular time. Packets 
consist of the Packet header, Payload (Original Message), and Trailer (to 
show the end of the message). The packet header is responsible for the 
packet to be delivered to the destination address. The packet header con- 
tains information such as source addresses, destination addresses, source 
port, destination port, and type of protocol, etc. This information helps 
in packet delivery from source to destination through the network. The 
payload contains the original information which is present as it is or some- 
times the information is encrypted. 


Source Addresses (IP, .) and Destination Addresses (IP...) 

For the packet transmission between two nodes, it is necessary that the 
packet should have the source IP addresses IP. (originating) and destina- 
tion IP addresses, IP. This may also be used to determine “normal” traffic 
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when gathered. Capturing a packet capture, for instance, might indicate to 
DDoS attacks or some other attack like botnets where IP. for traffic occur 
outside of standard ranges of IP or more concentrated when we have a 
comparison with genuine users. 


Source (Port, .) and Destination Port (Port,,.) 

The transmission packets are routed to various ports according to its needs 
and an examination of their frequency over time can create a baseline for 
monitoring for abnormalities. Security analysts will do more analysis and 
investigate when any of the following happens. 


e abnormal port scans 
e abnormal traffic at any of the ports 


Packet Content 

All generated packet has two parts, namely viz. packet header (responsi- 
ble for packet delivery) and packet content (original message and trailer 
to denote the end of message). For security purposes, both parts may be 
taken for examination. For example, if we take any antivirus software, 
packet headers are selected for examination to see if extraordinary quan- 
tities of network activity are targeting known susceptible programmers or 
including strange source IP information. Similarly, packet content of the 
same may be examined to see whether malicious code is there or whether 
odd application commands are present, which can indicate foreign cyber- 
attacks. However, the transmission protocol of the packet, as well as the 
types and placement of security and monitoring devices utilized, might 
limit packet content examination. 


Flow-Level Data 

Considering present internet speed and data transfer, one has to think 
beyond collecting and analyzing packet-level data. As a result, the practice 
of collecting flow-level data has grown, offering a macro-level perspective 
of network activity by analyzing groups of packets that have similar des- 
tinations, sources, protocol types, or other information indicated in the 
packet’s header. Flow analysis can help monitor network performance, 
application health, or host activity by analyzing similar packets together, as 
well as identifying unexpected network traffic that may indicate a poten- 
tial intrusion. When adopting flow-level data collection, businesses must 
decide where the data will be collected as well as the extent of that data 
collection. First, while flow data collecting can occur at any point inside a 
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network and even at several network points at the same time, it is generally 
most successful when linked with network edge nodes that monitor data 
going in and out of a local area network. At the same time, flow data may 
be acquired using either a “depth-first” or a “breadth-first” strategy. The 
former narrows the kind of flow data gathered to fit certain criteria found 
in packet headers, whereas the latter aims to collect as much information 
as possible in order to gain a comprehensive picture of overall network 
activity. 


Connection-Based Data 

As of now, we have discussed flow-based data collection and packet-based 
data collection. Both will be like a black box which provides comprehen- 
sive network information for review. But connection-based data on the 
other hand, like a white box, provides a deeper level of understanding 
about the traffic flow in an environment. It aggregates the traffic of the 
network between any two parties. 


4.3.2 Feature Selection Using Entropy 


Initially the abnormal status of the traffic flow is detected using Rényi’s 
quadratic entropy. The Rényi entropy is named after Alfréd Rényi, who 
looked for the most general way to quantify information while preserving 
additivity for independent events. Figure 4.4 contains Port Address (Src 

Dst,), IP address (Src,,, Dst,,), Physical address (Src Dst, _.), VLAN ID 


MAC’ MAC 


Router 


Traffic Entry 1 


Traffic Entry 2 


Traffic Entry N 


Figure 4.4 Feature selection. 
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and Duration are given input to Rényi’s quadratic entropy. Identical nodes 
are detected using Rényi’s quadratic entropy using equation 4.1. The Rényi 
entropy is significant within the field of subatomic particles because it 
may be utilized to quantify connectivity. The Rényi entropy being a factor 
may be computed directly with in Heisenberg XY turn chain system since 
this comprises an automorphic functional in regard to a certain specific 
group of the modularity group. The broadest approach to data quantifi- 
cation maintains properties for unrelated occurrences. The Rényi entropy 
gradually ranks all nonzero likelihood occurrences same as tends to zero, 
irrespective of actual likelihood. The Rényi entropy is simply the logarith- 
mic of the amount of the assistance of X in the bound for 0. When 0 <a < 
co and a # 1. The Rényi divergence the value is a = 0, 1, - by captivating a 
maximum, and in specific the maximum a > 1. 


—log 7-1 P,, whereP, =),./n 
H(B)=2 mimimal for concentrated samples (4.1) 
0 for identical samples 


4.3.3. Malicious Traffic Detection 


The internet is all about the movement of data, or traffic. Everyone is con- 
nected with the internet and when internet is connected, there is a flow of 
data between sender and receiver. We cannot predict whether all incoming 
data is good. There are some chances that through wanted data, malicious 
stuff is prevented from accessing our personal system. It is the purpose of 
your antivirus solution’s harmful traffic detection capability to keep your 
computer safe. If we were to rank the different elements of your endpoint 
security in order of priority, detecting malicious traffic would undoubtedly 
come top. A malicious traffic detection system continually analyses traffic 
for any indications of suspicious links, files, or connections that are made or 
received. Advanced harmful traffic detection skills can determine whether 
a suspicious link is a type of malicious traffic originating from bad URLs 
or C2 sites in order to identify malicious traffic. Typically, it compares the 
link to a massive quantity of security data collected from hundreds of mil- 
lions of devices around the world. This safeguards against both known and 
unknown dangers. When malicious HTTP requests reach the command 
and control servers, they send a message to your hacked PC or Mac, enlist- 
ing it in their bigger zombie army known as botnets. This communication 
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may be as easy as keeping a timed beacon on your PC so that hackers who 
have hijacked your PC can keep track of how many such PCs are in their 
inventory (yes, they have an inventory!). Alternatively, attackers can give 
orders to initiate harmful acts such as data theft or a ransomware attack. 
The malware must access your system in order for a command and control 
assault to take place. This is most commonly accomplished through phish- 
ing emails and social engineering assaults. 


4.3.3.1 Framing of the Expected Traffic Status 


Kaufman and Rousseeuw presented the K-Medoids (also known as 
Partitioning around Medoid) method in 1987. A medoid is defined as the 
point in the cluster having the fewest dissimilarities to the other points in 
the cluster. E = |Pi - Cil is used to calculate the dissimilarity of the medoid 
(Ci) and object (Pi) are given in equation 4.2. 


c= LG; Lec: |Pi — Ci (4.2) 


4.3.3.2 Traffic Filtering Using Regression 
The multinomial regressions are used to segregate the traffic are given in 


equations 4.3 & 4.4. 


Probyr =1-D.%) prob(Z, =0|Y) erin (4.3) 


n(Y) 
Probar = er i m Dep fun(Y) 
SS ean (4.4) 


Normal traffic and malicious traffics are detected using the equations 
4.3 and 4.4. 


4.3.4 Traffic Mitigation 


The first step is to select a medoid. It is selected by choosing k random 
points from the given pool of n data points. We can use any of the conven- 
tional distance metric methods which will connect each given point to the 
nearest selected medoid. Do the following as the cost decreases. For each 
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of the medoid point a and non-medoid point b, swap a and b, then assign 
each point the medoid which is nearer to it. Once it is done recalculate 
the cost. If the recalculated cost seems to be higher than the previous cost, 
undo the last step or else repeat the process. 


4.4 Implementation and Results 


Experiments for configuring SDN are implemented on mininet SDN test- 
bed. Mininet is a SDN emulator. Synthetic traffics are generated using 
hping3 command line utility. Mininet are configured on a Dell Inspiron 
with 16 GB RAM, an Intel Core i7-1011 CPU and a 64-bit OS, running 
Ubuntu PC. Syntactic traffic are mitigated using the POX controller. POX 
Controller will collect the traffic on every 30s from all the OVS (OpenFlow 
vSwitches) associated with SDN environment. POX controller will imple- 
ment new mitigation rules to deny the malicious traffic entering into 
home networks. Table 4.2 shows dataset details; it contains synthetic and 
KDD dataset. Trace/packet types and its percentage are given in Table 
4.2. Several evaluation parameters were used to evaluate the classification 
efficiency of the proposed system: Accuracy (AC), Detection Rate (DR), 
Precision (PR), True Negative Rate (TNR) and False Alarm Rate (FAR). 
AC: Based of all the observations in the study tests, it counts the amount of 
instances the model properly detected. The model considers both TP and 
TN while determining its accuracy given in equation 4.5. 


_ (TP +TN) 
(TP +FP+TN + FN) 


(4.5) 


PR: By partitioning the entire amount of categorized assault inspection, 
the model represents the amount of recognized assaults and the insights of 
those assaults that were found given in equation 4.6. 


TP 


| ae 
(TP + FP) Meo} 


FAR: To correlate with regular insights designated as an assault, the 
entire amount of regular facts in the dataset is halved by the entire amount 
of regular facts given in equation 4.7. 


ML Basep THREAT DETECTION ON SDN FoRI4.0 97 


Table 4.2 Dataset description. 


[Details = Synthetic dataset | KDD dataset [29] 
Trace Type and Traffic TCP & 60% TCP & 78% 
Percentage 
UDP & 32% UDP & 12% 
ARP ere ee 5% ARP oT 3% 


HICMP &3% | & 3% HICMP.&7% | &7% 


Time Window (Traffic 30 seconds 
Request) 


Number of Traces 1278225 Trace 1068800 Trace 


FAR= a (4.7) 
(FP + TN) 


TNR: It details the proportion of actual regular cases that the recogni- 
tion method predicts to be regular given in equation 4.8. 

Figure 4.5 shows before performing mitigation services, traffic contains 
malicious traffic and normal traffic. Packet retransmission occur due to 
malicious traffic. Figure 4.6 shows malicious traffic is reduced after imple- 
menting the mitigation process. In Figures 4.5 & 4.6, normal traffic is pro- 
jected in the gray area and malicious traffic is projected in the orange area. 
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Figure 4.5 Before mitigation. 
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Figure 4.7 shows normal traffic time bins and proposed system perform 
better than AD-NB and AD-MLP on normal traffic processing. Figure 4.8 
shows proposed systems better time bins on implementing mitigations 
process for different volume of data and perform better than AD-NB and 
AD-MLP. 


Normal Traffic Time Bins 


30 


Time Bins (mS) 


| 


100 1000 2000 10000 
Data Volume 
@ Proposed Technique & AD-NB | AD-MLP 


Figure 4.7 Time bins for normal traffic processing. 
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Time Bins for Mitigation 
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Figure 4.8 Time bins for mitigation process. 


Figure 4.9 shows CPU usage on malicious attack detection for different 
volume of data. Proposed system uses POX controller-based attack detec- 
tion. POX controller will perform mitigation services based on dynamic 
access control list. 
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Figure 4.9 CPU usage on attack detection. 
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Table 4.3 Performance of attack detection for synthetic data set. 


Detection penn Accuracy | Precision | Recall | F -measure | FP rate 
al 4 i fe pe cae 


Proposed DDoS for | 78. 78.31 | 86.19 | 19 2. 2.82 | 


Table 4.4 Performance of attack detection for KDD data set. 


Detection po Accuracy | Precision | Recall | F -measure | FP rate 
technique 4 a el ral ra 


Proposed |DDos_| f90 | 79. | 1 87.19 | 19 
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Table 4.3 shows proposed techniques perform better than AD-NB and 
AD-MLP in terms of attack detection, results accuracy, precision and FP 
rate. Table 4.3 shows malicious attacks detection for simulated dataset. 
Table 4.4 shows proposed techniques produced good results in terms of 
attack detection, accuracy and FP rate compared to AD-NB and AD-MLP 
for KDD data set. 


4.5 Conclusion 


In this research proposed techniques are used to detect the DoS and DDoS 
traffic and perform mitigation services against malicious flood. Proposed 
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systems are implemented on mininet SDN emulator and synthetic traf- 
fics are generated using hping3 command line utility. POX controller uses 
dynamic access control list for implementing mitigation process. Proposed 
techniques have high accuracy on attack detection and less FP rate com- 
pared to AD-NB and AD-MLP techniques. In future work, other types 
of malicious assault detection and network load-sharing conceptions are 
implemented using POX controller. 
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Abstract 

Nearly everyone has at least one internet-connected gadget in the connected world 
these days. As the number of these gadgets grows, it’s critical to create a secu- 
rity policy to reduce the risk of abuse. Negative actors may employ internet-con- 
nected gadgets to gather private information, hijack identities, jeopardize banking 
details, and covertly monitor—or observe. The setup and use of the gadgets can 
assist in stopping this kind of behavior if a few safeguards are taken. Although 
Wireless Sensor Networks (WSNs) have a wide number of uses, the safety of these 
WSNs is becoming increasingly crucial as sensor nodes get more intricate. Due to 
the placement of sensors in isolated places and the geographic dissemination of 
WSNs, significant security risks exist. Researchers are now looking into potential 
solutions in this new field as a result. A crucial component of WSN security is 
discussed and summarized. Numerous encryption methods, including symmet- 
ric keys and public keys, are investigated. These methods include identity-based 
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cryptography (IBC), pair-based encryption, and elliptic curve cryptology. This 
article proposes a data protection method which is an enhanced version of the 
proposed Diffie-Hellman algorithm that requires reduced computation and 
response time. By creating a hashing of each value that is sent over the network, 
the Diffie-Hellman has been changed to make it more secure against assaults. The 
proposed method’s security against different assaults has been examined. It has 
also been examined in terms of the length of time required for encryption and 
decryption, computation, and key creation for data of various quantities. The pro- 
posed strategy outperforms the existing ones in most instances, according to a 
comparison with them. 


Keywords: Wireless sensor networks, cryptography, encryption, decryption, 
pseudo random number generator, LEACH, Diffie Hellman algorithm 


5.1 Introduction 


The WSN is a flexible network with many kinds of sensor nodes based on 
the networking mechanism used: sinking node, ground station (GS), sen- 
sor nodes, and clustering heads. The data is delivered through the sensors 
to the clusters, which further send it to the GS via a modern communica- 
tion procedure. Sensors are utilized to detect and send data in a variety of 
settings. Sensors in various real-time applications perform unique tasks 
such as neighbor node detection, intelligent sensing, confidentiality objec- 
tive checking, tracing, node location, synchronization, and effective rout- 
ing across nodes and GSs. 

Secured data transfer is a big concern in WSN as there are numerous 
adversaries on the connection who can assault or counterfeit the data. This 
WSN data protection study focuses on studies published between 2016 and 
2022. Sensor networks have constrained computing, memory, and power 
capabilities [Akyildiz et al. (2002)]. There have been a substantial num- 
ber of studies published on digital security in WSNs. Elliptic Curve cryp- 
tography (ECC) is employed in several suggested research methodologies 
[Elhoseny et al. (2016) and Mahmood et al. (2019)]. Ullah et al. (2018) uti- 
lize it in conjunction with Advanced Encryption Standard (AES), in which 
the key is created via ECC. AES, on the other hand, is a block encryption 
technique with a long processing time based on the length of the keys and 
message. 

To provide secure data exchange, ECC is frequently integrated with 
deoxyribonucleic acid (DNA) [Tiwari et al. (2018)]. A chaotic mapping 
is a super secure quantitative approach, and Rivest-Shamir-Adleman 
(RSA) is widely employed in WSNs to protect data. [Wang et al. (2018), 
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Ahmad et al. (2018)] Because the WSN is a complex ecosystem, a few rec- 
ommended research methodologies, such as AES hybrid Elliptic Curve 
Cryptography (HECC) [Tiware et al. (2018)] and Elliptic Curve cryptog- 
raphy Genetic algorithm (EGASON) [Mahmood et al. (2019)], are thought 
to be the most suited. This study offered a novel cryptographic technique 
to secure data in wireless sensor networks. For data forwarding, the lower- 
energy adaptable clustering hierarchy (LEACH) algorithm is utilized. 
For data encryption, the suggested research method uses Hybrid Diffie- 
Hellman. Because Diffie-Hellman is indeed an asymmetrical technique, 
it provides a high level of security. Such an encryption method requires 
lower responsiveness and time complexity. 

The rest of the article is aligned as follows. Section 5.2 reveals the rep- 
resentation of the setting of the WSNs. Section 5.3 reveals the various 
researches held with respect to the proposed approach. Section 5.4 details 
the proposed algorithmic technique employed to secure the data transmit- 
ted in the WSNs and IoT networks, and section 5.5 illustrates the experi- 
mental results. 


5.2 System Architecture 


The WSN is made up of several sensing nodes that interact with one 
another. A sink unit and a GS help transmit the data between sensors in the 
connection between two nodes. The main idea behind WSNs is to remotely 
transport data among nodes. Figure 5.1 depicts a high-level representation 
of WSNs, including sink nodes, sensors, and GS. 


Ground 
Station 


Figure 5.1 System architecture. 
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5.3 Literature Review 


WSN input security is a key concern. The most important difficulty is to 
protect the information using an approach with a short calculation time, 
quick reaction time, minimal energy usage, and restricted bandwidth. 
A number of algorithms are proposed to protect knowledge in order to 
overcome obstacles such as power shortages, coverage concerns, and the 
creation of limited use of data measures. This part is further subdivided 
into the following sections: an elliptic curve, effective block ciphers, AES, 
chaotic-maps, RSA, and several additional algorithms. 

Elhoseny et al. (2016) developed a new approach known as EGASON. 
This method employs ECC for generating keys as discussed by [Mahmood 
et al. (2019)] and then employs the evolutionary algorithm methodol- 
ogy for encrypting data and deciphering. In contrast to symmetrical key 
processes, ECC may be vulnerable to brute-force attacks and have a large 
computational cost [Chaudhry et al. (2020), Mansoor et al. (2019)]. The 
approach in Somsuk et al. (2018) has the potential to break the usage of 
pseudo-random numeric generators (PRNGs). Santhosh et al. (2017) 
demonstrate strong encryption. On considerations of security, such a 
method solely employs the exclusive OR (XOR) function for encrypting, 
deeming it vulnerable to recognized and selected assaults [Daemen et al. 
(1991)]. 

A further study introduced elliptic curve cryptography-key managing 
(E-KM) [Singh et al. (2017)], which includes a key structuring mechanism. 
This study approach is vulnerable to collision screening attacks [Wiener 
et al. (1998)]. The most recent research confirms that implementing ECC 
puts it in danger of erroneous curve assaults [Neves et al. (2017)]. For safe 
transmission in WSNs as discussed by Viswanathan et al. (2019) employ 
elliptic curve key cryptography utilizing beta as well as gamma factors. A 
number of approaches employ ECC for WSNs, resulting in a demand for a 
new safe encryption technique for WSNs. As a novel key extraction strat- 
egy, Ghani et al. (2019) employed symmetrical key cryptography for device 
authentication within WSNs as well as the development of shareable keys 
for both encrypting and decrypting. 

Ullah et al. (2018) presented HECC as a novel key generation technique 
in their investigation. This secret key is capable of encryption/decryption 
as well as node authentication. The usage of a randomized number maker 
in this study can be fractured using Reeds’ approach [Reeds (1977)]. As 
AES is utilized to guarantee data protection, it is also vulnerable to biclique 
threats [Bogdznav et al. (2011)]. 
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The study of Tiwari et al. (2018) employs ECC and DNA for informa- 
tion cryptography and deciphering. To allocate genes, DNA is thrown off, 
and such genes are employed for encrypting data. This research is safe 
from timing and Simple Powers Analysis (SPA) assaults. This approach, 
however, is vulnerable to Man-in-the-Middle (MIM) attacks. Suresh et al. 
(2017) conducted a study on the IoT ecosystem. Double cryptographic 
architecture-secure network connection (D-SN) is an information secu- 
rity technology that encrypts information with DNA gene sequences using 
RSA and Data Encryption Standard (DES). The usage of a small open and 
secret key length will result in a security concern. Nitaj et al. (2014) and 
Weger et al. (2002) employed RSA, which is separated into four steps: dis- 
tribution of keys and generation, encoding, en route screening, and rout- 
ing. The use of RSA in this technique renders it vulnerable to Wiener and 
Boneh-Durfee assaults [Roy et al. (2018)]. 

MRSA (Modified Rivest Shamir Adleman) is an extra encryption 
method that enhances the current RSA [Manger et al. (2001)]. This study 
approach is vulnerable to a chosen-ciphertext assault. Furthermore, energy 
usage rises as a result of triple primes. Babu et al’s novel study (2019) uses 
the elliptic-curve Diffie-Hellman Key Extraction (ECDH-KE) process to 
provide end-to-end secrecy while also ensuring authentication and syn- 
chronization. However, this technique is vulnerable to a chosen-cipher- 
text assault due to the utilization of RSA [Lindell et al. (2014)]. Because of 
ECDH,, it is subject to MIM assault [Haakegaard et al.]. 

Elliptic Curve Cryptography-Advanced Encryption Standards (E-AES) 
is another research method that utilizes ECC to produce keys for encrypt- 
ing and decoding. This process is extremely secure, but it is sophisti- 
cated and entails a significant communication cost [Ullah et al. (2018)]. 
Another study in Li et al. (2017) delivers an improved variant of AES. 
AES, on the other hand, is vulnerable to biclique assaults. The secret key 
in this methodology is vulnerable to related-key assault. Another research 
method that employs AES is Advance Encryption Standard-Quadrature 
Phase Shift Keying (AES-QPSK), which includes or does not include a 
low-density parity-check (LDPC) [Khan et al. (2017)]. There is a chance 
that this method is vulnerable to biclique assaults. This research approach 
is likewise vulnerable to related-key attacks. Another study by Vangala et al. 
(2017) employs AES for data encryption with a hybrid mutation approach. 
This operation is dangerous for various attacks [Albassal et al. (2003)]. 

Extensive review in Vangala et al. (2018) employs AES alteration to gen- 
erate factors and the first seed. This strategy is extremely concerned with 
biclique assaults. The **Pseudorandom number generator (PRNG) algo- 
rithm is vulnerable to straight cryptanalytic assault, input-based assault, 
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backtracking assault, and other attacks. The approach in Reeds may be 
used to break down PRNG. The blended chaotic cryptanalysis with piece- 
wise linear chaotic map (HTPW) is a novel research approach for key 
management that makes use of skew and mappings [Al-Mashhadi et al. 
(2015)]. Because only the XOR algorithm is employed for protection, this 
solution is vulnerable to selected and recognized attacks. An additional 
way for safeguarding data in WSNs is by employing enhanced sensitive 
data utilizing chaotic-based encryption (EDCB). The keys in this approach 
are generated using chaotic maps. Chaotic systems may indeed generate 
unique values; however, the usage of chaotic systems renders this approach 
vulnerable to ciphertext alone, known plaintext, selected clear text, and 
picked ciphertext. 

The work in [Nidarsh et al. (2018)] employs the LEACH technique 
for packet transmission and then employs chaotic maps for information 
encrypting and decoding. In Sobhy et al. (2001), chaotic maps may be 
broken down using a variety of approaches. Another study by Wang et al. 
(2018) used logistical and Kent chaotic maps. There are a few downsides 
to this system, including weak passwords, constant chaotic sub-matri- 
ces, and plain-image callousness. A light block cipher (QTL) is a super- 
lightweight block cipher technique proposed by Li et al. (2016). This 
technique, however, is vulnerable to differential and polynomial attacks. 
Patil et al. (2017) introduce an alternative ultra-lightweight secure hash 
technique, linear cryptanalysis (LC). This method is vulnerable to known 
plaintext and selected plaintext assaults. 

Maity et al. (2017) offer a novel lightweight method named lightweight 
pseudo-random number generator (LGA). It is feasible to conduct a known 
assault on this method using permutations. Another study, Solomon et al. 
(2018), proposes ciphertext-policy attribute-based (C-AB) cryptography 
as a lightweight encoding and authenticating code synthesis approach. The 
encryption method using this technique protects data from eavesdrop- 
ping attacks. Differential attacks may have an impact on the Secure Hash 
Algorithm (SHA)-3. The decision-supporting system (DSS) is susceptible 
to timing attacks. Kocher et al. (1996) and Praveena (2017) introduces the 
Ultra-Encryption Standard Version 4 (UES-4) technique. UES-4 is the 
result of the combination of many existing cryptographic methods. Bitwise 
reorganization is done on the raw text, followed by bitwise tabular transit 
for unreadable material. 

Meanwhile, several encryptions are done; the text gets extremely diffi- 
cult to guess, making UES-4 resistant to brute-force attacks. This approach 
is susceptible to ciphertext-alone attacks (COA). The low intricacy secure 
algorithm (LSA) is a unique data protection study in [Li et al. (2007)]. 
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The initial stage is to track the development of the networks. XOR opera- 
tion is used to encrypt the plaintext and the key. Because nodes are merely 
XORed, it requires less computing time. XOR’s resistance against brute- 
force attacks is poor. 

This study by Ananda Krishna et al. (2018) introduces Modified Rotation 
XOR (MR-XOR), which is a revised variant of basic XOR. The XOR proce- 
dure is susceptible to brute-force assault. For key creation, Praveena et al. 
(2016) presented Modern Encryption Standard-Version 4 (MES-IV). This 
approach provides excellent defence against known plaintext, brute-force, 
and differential assaults. Because the ciphertext may be obtained, this 
research methodology is vulnerable to side-channel assaults. For data pro- 
tection, this study offers an electronic signature utilizing key-management 
(DK) [Kumar P S et al. (2022)]. The asymmetrical technique increases 
the computing time. Gracy et al. (2018) work employs honey encryption 
(HoneyE) to generate misunderstanding. A key retrieval assault against 
this encryption is also possible. 

Yue et al. (2019) adopted a mixed method in their investigation. This 
technique encrypts plaintext chunks using the Advanced Encryption 
Standard (AES) and the Elliptic Curve Encryption (ECC) technique, then 
compresses them using compression techniques to produce ciphertext 
chunks. Following that, it links the Port number and AES key encoded 
by ECC to generate a full ciphertext document. The writer of this study 
[Sountharrajan et al. (2020)] states that by utilizing this approach, he was 
able to cut encrypting time and boost privacy. Despite the fact that encryp- 
tion is decreasing, utilizing AES and ECC for the sensors might limit the 
sensor's lifespan, resulting in a deceased sensing node and connectivity 
issues [Karthiga et al. (2021)]. 

Ullah et al. (2018) study employs AES for encrypting data, and secret 
keys are created via HECC. The offered technique is guarded against both 
front and reverse secrecy; nevertheless, this technique may be compro- 
mised by utilizing a Random Number Generator. Because AES is used, this 
method is subject to biclique assaults. Another technique in Manger alters 
the RSA by using 3 primes instead of 2, making brute-force attacks more 
difficult [Kumar et al. (2018)]. Furthermore, the usage of RSA renders this 
technique vulnerable to a picked-ciphertext assault. 

Information safety in WSNs is a big challenge, with insufficient pri- 
vacy resulting in related-key, raw text, or MIM assaults [Sountharrajan 
et al. (2017)]. A raw text assault and a MIM assault might both impact 
EGASON. Another significant difficulty with data protection in WSNs 
is that present encrypting data techniques have extremely long response 
times or really long calculation times [Shree et al. (2017)]. AES-HECC 
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[Lasry et al. (2016)] has a long calculation time, while EGASON [Prusty 
et al. (2012)] has a long reaction time. A novel safe method that consumes 
less computing time and responds faster is proposed. Using the informa- 
tion reviewed previously, a novel technique for encrypting data is offered. 


5.4 Proposed Methodology 


This part explains the design of the system and recommended strategies. 
An Advanced Secured Effective Encryption Algorithm (ASEEA) is intro- 
duced. ASEEA generates keys using a customized Diffie-Hellman (CDH) 
technique, which is subsequently utilized in ASEEA. The input is encoded 
with ASEEA and afterward with CDH finally routed using LEACH. All of 
these techniques are addressed in depth in the subsequent sections. 

The proposed strategy is broken into three stages. Each step is intended 
to take as little processing and reaction time as possible. The three intended 
stages are listed below. 


Stage 1. Effective Hashing of the input 
Stage 2. ASEEA and CDH 
Stage 3. LEACH. 


LEACH in WSN applications is presented to allow two users with 
a secured data connection. LEACH provides communication security 
amongst sensors and safeguards against a variety of assaults. ASEEA is 
a novel and economical WSN cryptographic algorithm. This approach is 
intended to have a low computing and reaction time. Because the sug- 
gested technique doesn't really necessitate any additional difficult stages, 
it has the benefit of lowering processing time. The suggested technique is 
separated into two stages: the first involves the generation of the keys, and 
the next lies in applying the proposed cryptographic method. 


Generating Secret Keys 

During generating the secret key, the concerned parties seeking to estab- 
lish communication agree on constant prime value numbers like ‘j' and ‘k. 
The numbers are then employed in the following equation: K mod J. After 
the concerned parties have determined J and K, the above-said equation is 
utilized to construct a newer value, which is then handed to both the part- 
ners for assisting them in generating their own encryption key by utilizing 
their confidential values. The operation of the original Diffie and Hellman 
technique is detailed in Diffie et al. (1976). Because Diffie-Hellman is very 
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susceptible to MIM assault, an enhanced, highly secure customized variant 
of Diffie-Hellman called CDH is launched. 


CDH (Customized Diffie-Hellman) 

MIM attacks are increasingly widespread via the internet, because data 
may be quickly modified. CDH aids in the prevention of MIM attacks in 
Diffie-Hellman. The produced key by employing any cryptographic means 
must be safe or else the entire algorithm’s safety would be compromised. 
Since the earliest period, hashing has been utilized to enable the encrypted 
connection between communicative parties. The main idea behind hash- 
ing is to turn the source data into a representation that is nonsensical and 
challenging to decipher. Only parties involved may defeat the hash codes, 
allowing hashing to function as a validation scheme. The hash code needs 
to be computed using the correct technique and parameters, and only the 
parties involved are capable of doing so. 

To counter a MIM assault, the CDH technique improves Diffie-Hellman 
by applying hashing function. With the mastery in determining the correct 
functions and appropriate parameters, the recipient is the sole individual 
who really can compute the accurate hashes. An effective hashing method 
is proposed to determine the hash. All Diffie-Hellman parameters, like P,, 
P,, J, and K, are unsafe. A hash code is computed before these values are 
sent across the network. The values are first transferred to binary format. 
Following this translation, the number of 1’s is determined and placed in 
a temporary object like Tem. Next, the mod of Tem and the values trans- 
ferred to binary format is computed. The result for the value for mod is 
appended to the binary source input J,. The count of zeros is determined to 
distinguish the hash code from actual J, value. Zero’s count help in deter- 
mining the addition of the number of special characters among the source 
J, and hash code. This modest hash value estimate will aid in protecting 
Diffie-Hellman weaknesses while keeping processing time to a minimum. 
The suggested technique will aid in the prevention of MIM attacks. The 
stages that follow explain how the CDH technique works. 


1. The two common party's ‘M’ and ‘N’ have to decide on the 
parametric values like J and K 

2. J and K values are communicated among each other by uti- 
lizing the hash codes. The steps for the same are as follows: 
a. First transmit the J and K values to binary form 
b. The proportion of 1’s in the output should be calculated 
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c. The resultant count value is stored in the variable named ‘d’. 
Now the mod is calculated among ‘J and ‘d’ Again, calculate 
the proportion of 1’s. 

d. For the count values, add the special characters 

e. Initial binary values of J are added with the special characters, 
finally the values obtained from step d is accumulated. 

f. The same type of steps is used for K also 

This value is communicated to the party N. 

Party N will compute the hash codes for J and K by utilizing 

the same steps from a to f. 

5. Party M computes J, = GY mod J 

6. Then J,’s hash is computed using the steps from a to f. 

7. Party N computes K, = GY mod K 

8 

9. 

1 


He 


. Then K’s hash is computed using the steps from a to f. 
Both resultant values are communicated by the parties 
0. After obtaining the values, the parties then again compute 
the hash codes 
11.Party M, compute Key,, = J,“ mod J 
12. Party N, compute Key, = J,‘ mod J 


Stage 2: Encryption 

The proposed model for encrypting and decrypting the text is illustrated in 
Figure 5.2. The content is translated into ASCII numeric values initially, and 
subsequently to binary. In addition, the secret key is transformed to binary. 
These are XNORed after acquiring the binary numbers of both content 
and secret key. Because the intruder would never understand the key, the 
XNORed value will make the text complicated and unreadable. Once the 
XNOR is completed, the obtained unreadable values are shifted one time 
to the left. These might produce fresh unreadable content. The acquired 
content is then subjected to 1’s complements. The final step entails splitting 
the interim ciphertext into subsamples that switch locations between each 
other. The output cipher is then again converted to ASCII and transferred 
across destination. 

The ciphertext is transformed to binary during the decoding step. The 
gathered data are utilised to split the content into two sections. The place- 
ments of these subgroups are switched around. The decoding is then per- 
formed on 1’s complement of this data. The content obtained in the previous 
stage is then shifted again in the right progressively bit-by-bit. This content 
will be then XNORed with the secret key created by the Diffie-Hellman 
approach. As a result of this, the original statement is created again. It 
isn't feasible to crack this encryption utilizing Diffie-Hellman message 
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Figure 5.2 Proposed encryption and decryption process. 


authentication process. The cypher is made more effective by employing 
numerous processes. Because the text is XNORed with the secret key, it 
cannot be broken till the key is shared, as well as the secret key will never 
be understood until each party’s private parameter is known, which again 
is impossible because it is seldom broadcast across the internet. The pro- 
cedure is covered in depth underneath. The key is initially produced via 
Diffie-Hellman and translated to binaries. Encryption is conducted once 
the key is generated. The approach of encryption and decryption are 
detailed in Figure. 5.3. 


Encryption Procedure: 


1. The text to be sent to the destination is decided initially. 

2. Each character in the destined text is transmitted to ASCII 
decimals and then these are transmitted to binary format (8 
bits). 

3. The resultant output from step 2 is then XNORed with keys 
of equal length generated by Diffie-Hellman. 

4. The resultant output from step 3 is undergone left shift once. 
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Plaintext 
(in Binary) 


Cipher text 
(in Binary) 


Left Shift-1 


1's Complement 
1's Complement 


Left Shift-1 


Cipher text 
(in Character) 


Plaintext 
in Character) 


Figure 5.3 Encryption and decryption approach. 


5. Then the output from step 4 is undergone one’s complement. 

6. Then the result is divided into subsets equally (E.g.: 1100 is 
divided into ’11’ and ‘00’ subsets). The divided subsets are 
interchanged from their positions (E.g.: 1100 is interchanged 
as 0011) and this interchange completely modifies the binary 
values and the same process enhances the encoded values’ 
complexity. 

7. Finally, the result from step 6 is converted back to ASCII 
decimal and the same is again transmitted to the respective 
alphabets. 
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Decryption Procedure 


1. The encoded cipher is obtained at the destination. 

2. The received cipher is decoded character by character to 
ASCII decimals initially. Then the result is transmitted to the 
binary format (8 bits). 

3. Then the resultant output is splitted into two halves equally. 
(E.g.: 1011 is splitted to 10 and 11). These splitted subsets are 
interchanged with respect to their positions. (E.g.: 1011 is 
changed to 1110). 

4. Final result is undergone one’s complement. 

. Then the one’s complement output is shifted right once. 

6. The result from step 5 is XNORed with the receiver's secret 
key. 

7. Finally, the values are converted back to ASCII decimal and 
then to binary. 


on 


Though the working procedure of ASEEA algorithm is simple, the com- 
plexity lies in the key generation phase utilizing hash codes. As the hash- 
ing appends some data to the original text, the length of the original text 
becomes unpredictable to the hackers. ASEEA algorithm will diminish the 
computation time as well as response time and so the proposed algorithm 
is simple compared to the others. 


Stage 3: LEACH technique 

LEACH is proposed for WSNs to offer safe data transit amongst commu- 
nication parties. LEACH provides encrypted connection amongst sensors 
while also protecting them from different assaults. The LEACH technique 
is used for routing in this study. The technique employs a clustering algo- 
rithm in which cluster centres are chosen at random. The transmission 
begins when the cluster leader is chosen. These stages demonstrate how 
interaction among two nodes is formed by employing LEACH standard 
and thus are explored in depth below. 


1. The sender node transmits the data/secret key to its neigh- 
bour cluster heads. 

2. Then the cluster head transmits the same to the sink. 

3. The sink then again transfers the same to the cluster head of 
the receiver. 

4. The cluster head at the receiver end transmits to receiver. 
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These steps are repeated. The same process is employed for transmit- 
ting the key as well. The various advantages of choosing LEACH are that it 
completely reduces the data traffic in the entire transmission of both shared 
key as well as the encrypted data. As routing takes place at a single hop 
amongst the cluster head and the other nodes, energy is completely saved. 
Along with that network lifetime is greatly enhanced. Node’s location 
information is not gathered while forming clusters; this in turn increases 
the privacy. Finally, LEACH operates completely autonomously without 
any control from the base station so it is distributed in nature. 


5.5 Results and Discussion 


For evaluating the results, multiple parameters such as time for encryption, 
time for decryption, response time and cost for computation are consid- 
ered. For simulation, the LEACH standard is utilized for routing. Five to 
1000 nodes for more than 10 rounds are used. MATLAB is employed for 
simulation. 


Environmental setup 


e Platform: MATLAB 2021 

e Proposed Model: LEACH for routing anong WSN 
e Simulation area: 100 X 100 

e Number of nodes: maximum — 1000, minimum - 5 
e Key size: maximum - 2'°, minimum - 2° 

e Text size: maximum — 2!°, minimum - 2° 


Computational time 

The attribute determines how long an algorithm assumes to process a given 
quantity of data. The outcomes of the suggested approach are contrasted 
to the research findings of Singh et al. (2017). Their research revealed the 
effectiveness of their suggested technique where key creation, encrypting, 
and decoding are conducted on input of 10-byte size, as well as the key 
changes in relation to the various ECC values created. Our suggested solu- 
tion has been evaluated with three various data sizes, as well as the key size 
remaining the same while maintaining security. The suggested technique 
outperforms the research strategy in Singh et al’s investigation by updating 
the key at each round. As the secret key is produced using the MDH tech- 
nique, it is extremely challenging to crack because neither party's secret 
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Figure 5.4 Computational time for the input data. 


key is ever sent across the internet and arbitrary numbers are utilised to 
establish generic attributes for public key. Figure 5.4 details the computa- 
tion time of the ASEEA algorithm and the same is generated by utilizing 
the data in Table 5.1. 


Time for Generating the Key 

This attribute computes how long it takes to produce a key each time. 
Figure 5.5 details the time for generating the key using ASEEA algorithm 
and the same is generated by utilizing the data in Table 5.1. The calculated 
time is computed in terms of nanoseconds. Diffie-Hellman is employed to 
generate unique keys. Various key sizes are assessed with respect to how 
long it takes to process different sizes of data. The data size of 10 holds a 
key generation time of around 37,334 nanoseconds, that is appropriate for 


Table 5.1 Result analysis of the proposed technique. 


Parameters considered Time 


Message | Length of Generation | Time for 
a — Computational 


ee 7 Sameas |35,383 15,540 ae 583 pe | 966 
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as Sameas | 56, 988 212, 650 170, 478 438, 511 
ou above 
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KEY GENERATION TIME 


Data size © Decryption time 


Figure 5.5 Key generation time for the input data. 


the WSN context. The comparison of the enhanced Diffie-Hellman algo- 
rithm with the existing Diffie-Hellman algorithm is represented in Table 
5.2. From the results, it is clearly understood that the enhanced version of 
the existing algorithm outperforms in terms of computation time and key 
generation time. 


Table 5.2 Comparison of existing and proposed Diffie-Hellman algorithm. 


Parameters considered 
Generation of key (ns) | Computational time 
Existing Existing 
Raw Private 
input key 


Diffie- Diffie- | Proposed 
Hellman Hellman ASSEA 


Welcome | 7 5 Sameas_ | 30,501 35,383 73,856 77,966 
above 


Hello 9 5 Sameas_ | 33,521 37,448 79,961 83,916 
Guys | (with above 
space) 
Pleasant 5 Sameas | 51,982 56,988 424,514 438,511 
Day above 
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Figure 5.6 Encryption time for the input data. 


Encryption Time 

This attribute computes the time necessary to execute the suggested cryp- 
tographic operations. The technique that consumes the least amount of 
time to encrypt is termed effective. This suggested encryption technique 
is evaluated for various data sizes with respect to time required for each 
input. Figure 5.6 compares the duration of time required to encrypt with 
respect to data length. If indeed the input is 15 bytes long, the suggested 
encryption algorithm will require approximately 59,000 ns to encode. 
59,000 in seconds is 5.9E-5, which is extraordinarily quick. So, in general 
the proposed method provides a time complexity of O(e") where n is the 
number of bytes of data. Because the suggested solution requires a rela- 
tively short time to encode, it could be regarded as a superior technique for 
the WSN context. 


Decryption Time 

This attribute computes the amount of time it takes for a text to create 
original text from the acquired ciphertext. This suggested approach is eval- 
uated for varied amounts of content in terms of how much time it requires 
to decode each input if the key is already known. Figure 5.7 compares the 
duration of time required to decrypt with respect to data length. If indeed 
the input is 45 bytes long, the suggested technique will require roughly 
122,990 ns to decode. The time period is only feasible if the key is shared; 
without having the key, decryption is impossible. As the number of bytes 
rises, so does the time required for both encryption and decryption. The 
suggested techniques require less time than the method presented by Singh 
et al., 2017. As the size of the data increases, the time for decryption also 
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Figure 5.7 Decryption time for the input data. 


increases gradually. So, in general the proposed method provides a time 
complexity of O(e"). As seen in Figure 5.7, the proposed model greatly 
enhances the security with minimal encryption time and decoding. 


5.6 Analysis of Various Security and Assaults 


Numerous measures are employed to assess the trustworthiness of an algo- 
rithm across multiple assaults. When dealing with cryptanalysis, numer- 
ous assaults on the internet might occur, which must be considered when 
delivering a safe cryptography strategy. 


Plaintext Assault 

When the adversary has accessibility to either the plaintext or the cipher- 
text, this assault happens. This is viewed as a relatively elementary assault 
on a cryptographic system. The hacker can grab parts of plaintext when- 
ever the transmitter submits content for encryption. The password is 
never revealed to the hacker since it is sent through a protected network. 
The hacker attempts to build the encryption scheme, which is then utilised 
for ciphertext decoding, using some existing cypher and plaintext. The 
suggested technique does not communicate plaintext across the internet, 
but only the cipher content is transmitted. Because the key is not really 
communicated across the internet, this assault becomes extremely chal- 
lenging to carry out. Even if the assailant acquires any ciphertext, this 
assault is still hard to execute because the key and cyphertext is changed 
at every round. 
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Ciphertext Assault 

A known-ciphertext (COA) assault is a cryptographic assault paradigm in 
which the assailant is assumed to only possess access to a subset of cipher- 
texts. The assault is accomplished if the associated plaintexts or the key can 
be guessed. This assault is extremely challenging to carry out because, while 
ciphertext is delivered over the internet, the secret key is never transmit- 
ted across the system. Instead of employing a secure communication, the 
suggested solution exchanges the key via common metrics. Because a pro- 
tected channel is not utilised to convey the secret key, the key is unknown 
to the assailant, making decryption of the ciphertext incredibly hard. The 
ciphertext cannot be decoded without the secret key, hence there is a good 
possibility that this assault will not happen if the suggested approach is 
followed. If the secret key can be retrieved, the ciphertext may be analysed. 
It is extremely challenging to retrieve the secret key while utilising the rec- 
ommended technique, making it extremely impossible for this assault to 
succeed. 


Related-key assault 

A related-key assault is a type of cryptology in which the assailant may 
watch the functioning of a cypher under multiple distinct keys whose con- 
tents are originally undisclosed, while the assailant is aware of some statis- 
tical connection linking the keys. For instance, the assailant may be aware 
that the final 80 bits of the secret keys are usually identical, even if they 
never know what bits of data occur initially. Because the suggested solution 
does not employ the same key for every round of the LEACH, this assault 
gets more challenging to carry out. If the assailant knows the key, it will 
only function for a single encrypted text because a new key is created at 
each round. 


Man-in-the-middle assault 

Three people are involved in a MIM assault. There would be the client, the 
person with whom the victim is attempting to connect, and the “guy in 
the midst,” who is interfering with the victim's interactions. Diffie-Hellman 
is incredibly susceptible to MIM. This exploit will reduce protection by 
gaining access to all secret attribute values. In customized Diffie-Hellman, 
hashing is utilised to protect the suggested technique from this assault. This 
suggested method is safe from MIM attacks after altering Diffie-Hellman 
with a hash code. Because only two interacting parties can create the right 
hash value, this assault becomes irrelevant in the suggested technique. 
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5.7. Conclusion 


A WSN is a distributed network that incorporates many circulating, 
self-focused, compact, low-motorized units known as sensors. Although 
sensor networks are widely utilised, they are quite complex owing to the 
restricted amount of power and storage they can consume. Data protec- 
tion is a major key challenge with WSNs, along with many others, because 
data going through the internet is never secure as several intruders can 
gain access to it. The information must be safeguarded from the adversary; 
therefore, it has been encrypted and converted into an unreadable format. 
Multiple ways have been employed to protect the data for privacy; how- 
ever, owing to weaknesses, these techniques are not regarded viable for 
WSNs. A secured and simple data encryption method is suggested. This 
strategy will use less computing time. Because the suggested technique has 
a shorter processing and reaction time, it is the greatest match for WSN 
cybersecurity. The suggested method also resists plaintext, known cipher- 
text-only, related-key, and MIM assaults. 
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Abstract 

Today’s culture is talking a lot about blockchain, which is rapidly gaining popu- 
larity. Despite the fact that it has already altered many people's lifestyles in cer- 
tain ways, opponents have highlighted worries about its scalability, security, and 
sustainability due to its tremendous impact on several sectors and enterprises. 
Numerous industries, including banking, healthcare, transportation, risk manage- 
ment, the Internet of Everything (IoE), as well as social and public services, have 
adopted blockchain, which is a decentralised technology. It takes considerable abil- 
ity to resolve commercial issues. Each transaction on a blockchain is connected to 
prior transactions or records, and the records are encrypted. Algorithms running 
on nodes verify transactions happening through blockchain. One individual (or) 
entity could not start a transaction. Ultimately, blockchains offer transparency, 
enabling any user to keep track of transactions at any moment. In this chapter, we 
try to perform an in-depth analysis of blockchain technology by looking at the 
scenarios and problems from the privacy and security viewpoints. 
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and future scope 


6.1 Introduction 


Blockchain technology has enormous potential for a wide range of appli- 
cations and offers numerous opportunities for various infrastructure. 
Resource management is encouraged by the technology, which also guar- 
antees effective and safe communication [1]. One question that is becoming 
increasingly common is about the cryptocurrency bitcoin and the technol- 
ogy source that powers it, known as blockchain. Confidence is raised when 
parties carry out banking transactions with blockchain since it lowers the 
risk of theft and instantly creates a record of activity, thus establishing an 
automated background investigation for each system user. Because of its 
decentralised nature, blockchain creates dependability and lowers the risk 
involved in entering into a business agreement with an unfamiliar party. 
Blockchain technology is crucial to the advancement of industry. The 
development of privacy protocols and blockchain decentralisation technol- 
ogies for safeguarding, data services, auditing, and regulating transactions 
on digital platforms might be advantageous to many businesses. Blockchain 
is based on decentralised and secure distributed protocols without a cen- 
tral authority or source of control, and data blocks are created, added to, 
and confirmed by network nodes themselves. Blockchain is a distributed 
ledger that utilises cryptographic algorithms [2, 3]. Images, texts, video 
calls, and voice conversations may all be made and received straight over 
the internet. The sender and the recipient must retain a trusted third party 
throughout the transaction. In the conventional system, consumers have 
to depend on a third party to execute their financial activity. On the other 
hand, blockchain will offer complete transactional protection. Every trans- 
action should be recorded in a block, which will behave as a record book. 
Every time a transaction is finished, a block is appended to the blockchain, 
which serves as a permanent database. When a current block is finished, 
a new block is usually added or generated. Every block contains a hash of 
the block before it. A distributed peer-peer network is utilized to man- 
age bitcoin, the first decentralised digital currency ever created. Bitcoin 
was subsequently acknowledged as the leading currency with respect to 
user acceptance and broad use [4]. The blockchain’s primary function is to 
record time-stamped data from transactions in data blocks that are linked 
together in a chain in the order in which they occurred. Every block is 
given a distinct hash value through a cryptographic procedure to ensure 
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the confidentiality of the data. Similar to a linked list, these hash values 
serve as linkages between these blocks. Asset owners can use blockchain 
to track and trade valuable items, such as outstanding invoices, in a secure, 
transparent, private, and self-reconciling “chain” of transactions. Every 
block contains the hash of the previous block, which makes it easier to 
link the blocks that make up the blockchain [5]. Blockchain-based net- 
works took a while to catch on because of their complex design, but even- 
tually a variety of global businesses, including the financial, healthcare, 
logistics, manufacturing, and energy sectors [6] started to pay attention 
to them. The blockchain operates inside a sophisticated framework that 
brings together a variety of other widely used technologies, such as distrib- 
uted environments, decentralised architecture, peer-to-peer networking, 
encryption and others [7]. Blockchain technology has been employed in 
areas other than digital currency, such as the Internet of Things. Blockchain 
technology could be used in many everyday business-to-business trans- 
actions in the future, including those powered by enterprise applications 
[8,9]. Autonomous marketplaces for other assets are likely to proliferate. 
Because the software is a controlled and open framework that is visible to 
all transaction participants, a blockchain-based transaction eliminates the 
need for third-party oversight. Blockchain technology is expected to have 
huge implications when it becomes more widely used, radically changing 
how people use the internet [10-12]. On a commercial level, it is being 
adopted by enabling advances in the IT businesses for the enhanced efh- 
cacy and streamlined business activities. In order to support the growth of 
businesses and carry out independent blockchain research, key industry 
players like Google and Microsoft have built outlines that gives blockchain 
architecture-based service to clients. 


6.2 Blockchain Technology 


Bitcoin allows for decentralised peer-to-peer exchange of digital cash via 
the internet. As a distributed ledger that is open to all users and is hosted by 
several willing hosts, or nodes, the blockchain is implemented in bitcoin. 
Blockchain is a method of recording information that makes it impossible 
or difficult to change, hack, or manipulate the system. A blockchain, like 
a database, stores information electronically in digital format. Blockchains 
are best known for their critical role in cryptocurrency systems like bitcoin, 
where they keep a secure. A blockchain is a distributed ledger that dupli- 
cates and distributes transactions across the blockchain’s network of com- 
puters. The ledger authenticates and records transactional data sent via the 
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network. Given that the whole network, as opposed to a single authority as 
in traditional financial frameworks, is responsible for the verification of the 
transactions, the capacity of the blockchain in bitcoin to prevent double 
spending in transactions is a distinguishing feature. In its most basic form, 
a blockchain is a ledger that securely extends the chain over time while 
recording transactions in an immutable, append-only manner. 

Cryptographic methods are used to protect the blocks within the block- 
chain, ensuring the integrity of the transactional information [13]. The 
integrity of the data is ensured by the permanent records that make up 
the blockchain, which cannot be changed or tampered with. The dispersed 
nodes of the network are used to transit the data on the blockchain. The 
blockchain is distinct from previous technologies in that it provides an 
overall order of blocks by timestamping data entries. The data hash value 
inside a block and the hash value of the block preceding it serve to link the 
chain of blocks together. The existing blockchain can only be expanded 
with a new block if the consensus technique has been properly applied. 
This consensus procedure must control chain entry rights, follow security 
protocols for block verification, and guarantee record consistency across 
all network nodes. As is obvious, the blockchain is a distributed ledger that 
uses a decentralised network to safely and impartially verify all financial 
information. Security concerns including data breaches or transactions, 
reliance on a third party or centralized body, and the unpredictability of 
other parties may be addressed by a blockchain-based system. Blockchain 
technology has developed significantly and undergone an evolution; Figure 
6.1 below illustrates these phases. 

The first iteration of the blockchain technology, known as version 1.0, 
had a decentralized public ledger for holding money. Version 2.0 of the 
blockchain includes a system for maintaining trust through payment sys- 
tems that runs independently without the involvement of any other parties. 
The third stage, known as blockchain version 3.0, represents the technolo- 
gy’s present and future. It encompasses a number of application domains, 


BLOCKCHAIN BLOCKCHAIN BLOCKCHAIN 


Version 1.0 Version 2.0 Version 3.0 


A cryptocurrency Insightful Contracts Cloud, Open Chain 
ledger Access 


Figure 6.1 Blockchain development stages. 
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including distributed finance issues, IoE, identity managing, education, big 
data, artificial intelligence, healthcare, as well as others [14]. 


6.3. Blockchain Revolution Drivers 


Blockchain is a fundamental innovation that paves the path for the change 
in thinking from believing people to relying on machines although from 
central to distributed authority, in addition to its cutting-edge architec- 
ture and applications. To fully grasp blockchain’s potential, we may study it 
from two angles. The ability to keep track of who owns what assets on and 
off platforms, as well as the rights and duties that come with contracts, may 
initially form an Information and Communications Technology (ICT). 

A blockchain may be used to store any kind of data, including ownership 
of assets, contractual responsibilities, copyrights for creative works, credit 
exposures, and digital identities. Second, blockchain may be considered 
a technical institution that decentralises the governance frameworks that 
serve as the foundation for social and economic decision-making. While 
we use an ICT viewpoint, the major forces behind the blockchain revolu- 
tion may be explained from both an institutional and an ICT standpoint. 
These elements are illustrated in Figure 6.2 and are explained as follows. 


6.3.1 Transparent, Decentralised Consensus 


The sequence in which applications, activities (deploy and invoke), and 
data have been performed, updated, or produced is confirmed via a block- 
chain technique known as consensus. The appropriate sequence is crucial 
because it can produce ownership, which can lead to rights and duties. 
Blockchain networks are decentralised, meaning there is no central hub or 
authority that decides what happens when, accepts transaction activities, 
or creates rules that allow nodes to communicate with each other. 


Distribution and BLOCKCHAIN Effect on Services, Business, 
transparent consensus and Regulation 


Elements 
Security and 


immutability Identity & Access 


Automation and 
Anonymity 


Figure 6.2 Blockchain elements. 
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6.3.2 Model of Agreement(s) 


The consensus model or models aid in maintaining the veracity of data 
stored on blockchains. Issues about different consensus mechanisms, 
which include blockchain splits, consistency breakdowns, dominance 
issues, verifying nodes, and inferior network functionality, may surface 
when a consensus method fails. A consensus method has the following 
three qualities based on practicality and efficacy: 


1. Safety - For a consensus protocol to be secure and depend- 
able, each node must produce output that complies with the 
protocol’s specifications. 

2. Liveness - In order to offer a value, a consensus method 
makes sure that all active, non-faulty nodes are present. 

3. Fault Tolerance - A compromising protocol allows tolerat- 
ing failures simultaneously permitting a fault node to partic- 
ipate in the protocol recovery. 


6.3.3. Immutability and Security 


A shared, vandal duplicated record referred to as a blockchain uses one- 
way cryptographic algorithms to make records unchangeable and nonre- 
pudiable. Records are made irreversible and nonrepudiable using one-way 
cryptographic hash techniques. A reliable historical database that has 
received universal support assists to increase confidence in the system. If 
someone or any entity doesn't have influence over most of the miners, it 
gets very hard for anyone to mess with the record (voters). Blockchain has 
been alluded to as the “trust machine” by The Economist. 


6.3.4 Anonymity and Automation 


Through blockchain, a collection of individuals may work together while 
accessing global data sources, with automated reconciliation between all 
contributors. Using public/private key technology, the owning claims to 
the material are performed and data transfers are permitted, eliminat- 
ing the need for personal communication, trust providers, validation, or 
adjudication. The application makes sure that duplicated or conflicting 
data cannot be entered to the ledger indefinitely. Automation is the use 
of algorithms (smart contracts or smart contract software) to monitor, 
evaluate, and control the implementation of contracts in an automated 
manner. 
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Blockchain technology's basic goal is to seamlessly transfer confidence 
from a particular central body to the entire network. The blockchain net- 
work’s nodes each have the capacity to safely send and update data. A 
decentralised autonomous organisation (DAO), also known as a decen- 
tralised autonomous corporation, is an organisation that abides by some 
protocols delivered as computer programmes and known as smart con- 
tracts (DAC). Blockchain uses blocks to record the details of smart con- 
tracts and transaction history. 


6.3.5 Impact on Business, Regulation, and Services 


Both the public and commercial sectors have high hopes for blockchain 
technologies since they are the cornerstone for building peer-to-peer net- 
works for exchanging data, assets, and digital goods without middlemen. 
Blockchain could immensely improve the application of governance and 
regulatory controls across a variety of economic sectors and in whole 
unique ways. In the framework of the present fourth industrial revolution 
that is characterised by fusing numerous technologies that distort the bar- 
riers between physical and virtual space, blockchain is a component of a 
wider toolset. Blockchain has the potential to upend numerous businesses 
and society as a whole when paired with other cutting-edge technologies 
like AI, driverless cars, fog computing, and machine learning. 


6.3.6 Access and Identity 


Three crucial factors—public (or) without permission, allowed permission 
(or) private, and consortium—affect a blockchain’s identification and func- 
tion. In-depth discussion of these blockchain criteria may be found in [15]. 
On a private blockchain, users have fewer options for creating smart con- 
tracts and validating block transactions. This is suited for normal enter- 
prises and governance structures [16]. Public blockchains are designed to 
keep the degree of security while eliminating the middleman from transac- 
tions [19]. Anyone with internet connection and a public blockchain may 
join the network by creating smart contracts and participating in block 
verifications. 


6.4 Blockchain Classification 


Despite the fact that the structure, accessibility, and verification of block- 
chain technology are continually evolving, many application areas are 
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Types of Blockchain 
Public Blockchains 


Private Blockchains 


Consortium 
Blockchains 


Figure 6.3 Types of blockchain. 


embracing it. Users can select from the three types of blockchains described 
below based on their demands and the circumstance [20-22]. All of these 
different blockchain models have certain essential characteristics in com- 
mon, like a decentralized architecture, interacting among peer-peers, 
consensus operations as well as timestamping, while they differ from one 
another. Public, private, and consortium blockchains are the three different 
types of blockchains that are shown in Figure 6.3. 


6.4.1 Public Blockchain 


Accessing data using a blockchain’s distributed, open network is not sub- 
ject to any restrictions. Nevertheless, a public blockchain may be either 
written with permissions or without them. In a permissioned network, 
only a select few nodes are authorized to perform new transactions (which 
are recorded in the blockchain), verify the purchases made by other nodes, 
and review the transaction log. If the network has no permission limita- 
tions, anybody can write into it (reading the blockchain). Real evidence 
consensus enhances the trustworthiness of the public blockchain. Since a 
large number of nodes often join the network as soon as it is made acces- 
sible to the public, and more nodes equal a more dispersed network, such 
a blockchain is viewed as safe. Additionally, the blockchain is transpar- 
ent since all nodes may view the records ledger. However, there are sev- 
eral limitations like the poor processing speed caused by the network’s 
many nodes. Since proof-of-work needs a lot of time and labour to verify 
requests, such blockchains have problems with scalability and efficiency. 
The most popular public blockchains on the market right now are Bitcoin 
[3], Litecoin [17], and Ethereum [18]. 
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6.4.2 Private Blockchain 


A private blockchain’s structure includes a few restrictions and operates 
in a closed system. These solutions are preferred when a company wants 
a blockchain with access and involvement from a small number of users. 
Additionally, no one is allowed to see the information or be involved in 
transaction activities inside the blockchain [21]. Such a thing may be used 
to secure consumer resources, monitor supply chains for artificial intel- 
ligence-based people, etc. It is managed by the enterprises themselves. 
These blockchain networks might be either permission is allowed or not 
inside the private group of users. Private blockchains perform computa- 
tions more rapidly than public blockchains because to the lower number 
of participating nodes, and they are scalable, allowing for the option to 
change the number of nodes according to demand. As a result, transaction 
verification and validation are more effectively done. One significant dis- 
advantage of private blockchains is that they don't provide the same degree 
of decentralised security as public blockchains. 


6.4.3. Blockchain Consortium 


Partial decentralisation may be demonstrated in the consortium block- 
chain, which blends public and private blockchains. In this blockchain 
network, the node has the power to decide in advance whether the data 
or transaction details are public or private. It’s critical to comprehend the 
differences among a consortium and a fully private blockchain. It is pos- 
sible to think of this blockchain, also known as federated blockchain, as 
a publicized blockchain with authority where anybody may access data 
through the network but only the representative nodes are permitted to 
put data into the network. Figure 6.4 displays the corresponding pattern 


Public Consortium Private 
Blockchain Blockchain Blockchain 


Figure 6.4 Blockchain type pattern representation. 
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representations. This blockchain features a lot of nodes, much like a public 
blockchain, but unlike a private blockchain, the nodes are subject to spe- 
cific restrictions. This kind of blockchain is often used in the financial and 
government sectors, such as R3, the Energy Web Foundation, etc. 


6.5 Blockchain Components and Operation 


A distributed network called a blockchain is employed to safely collect 
financial information logs. As shown in Figure 6.5 below, the blockchain 
maintains data in the form of chained-together blocks. The blockchain 
grows in size as more transactions take place because a block is issued after 
a certain period of time that contains data about the activities that took 
place during that time. The mining procedure starts when a transaction is 
requested and broadcasts the request to all network nodes for consensus 
protocol approval. The block is only appended to the chain when it is ver- 
ified by every other node. 

A blockchain is a growing collection of information known as blocks that 
are joined together and encrypted. Each block generally contains transac- 
tion information, a timestamp, and a cryptographic hash of the preceding 
block. The adjacent record of transaction blocks may be saved in a tiny 
repository of data (or) in flat files recognitions to the efficient structure of 
blockchain data. These blocks are connected together, and each link in the 
chain refers to the block that came before it. A chain’s genesis block is the 
first one. The blockchain is shown as a vertical stack, with the genesis block 
at the bottom and blocks placed on top of one another. A lot of information 
regarding the structure of blockchain is provided in [23, 24]. All blocks are 
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Figure 6.5 Blockchain structure. 
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said to be uniquely recognised by cryptographic hashes produced by the 
SHA256 method and stored in the block header. 


6.5.1 Data 


The services and applications used in a blockchain decide the data kept 
there. Peer-to-peer file systems like storj, Ethereum swarm, sia, and others 
might utilise it to store files in the cloud. Applications for the stored data 
include recording transaction information, banking, contracts, and IoT. 


6.5.2 Hash 


Let’s start by defining a cryptographic hash function as it is essential to 
both techniques. A cryptographic hash function produces a fixed-size hash 
from a variable-length input. In other words, it changes a variable-size bit 
array from an arbitrarily huge input (hash). 


Input of Any Length —> — | Constant Length Output (Hash) 


Data integrity checks and file identification frequently involve the usage 
of cryptographic hash algorithms. Comparing hashes is quicker and sim- 
pler than comparing the actual data. Additionally, they are employed for 
password verification, database storage of private information (such as 
passwords), and authentication purposes. 


6.5.3 MD5 


A cryptographic hash function called MD5 creates a 128-bit hash from 
data that can be any length. Despite being regarded as cryptographically 
defective, it is nonetheless commonly used in several contexts. Verifying 
the integrity of files exchanged for PR purposes is one of the most popular 
uses. The 512-bit data is processed by the MD5 algorithm in 16 words of 
32 bits each. There is a 128-bit hash as a result. As we previously noted, the 
MDS is regarded as cryptographically broken. Let's discuss its security in 
more depth. There are potential MD5 assaults. Such methods might result 
in collisions on a normal computer within a minute. Results have given 
enough justification to eliminate using MD5 in applications that need 
collision resistance, including digital signatures. For solutions requiring a 
high level of security, MD5 is no longer advised. However, it’s frequently 
utilised as a file checksum. One message authentication methodology to 
verify the content of outsourced files is MD5 [25, 26]. 
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6.5.4 SHA 256 


A family of hash algorithms that is extensively used is SHA. SHA-256 ranks 
among the safest and highly used hashing techniques. To begin with, it 
is a one-way operation. Therefore, it is quite challenging to deduce the 
input from the hash. A brute force assault would ideally need 2,256 tries 
to be successful. Second, SHA-256 is collision-resistant. This is due to the 
2,256 different hash values that may be used. As a result, in practise there 
is essentially little danger of accident. Lastly, the SHA-256 makes use of the 
avalanche phenomenon. ‘The input may be slightly altered to get a com- 
pletely new hash. In summary, the cryptographic hash function SHA-256 
meets all important requirements. It is widely employed in applications 
that require a high degree of safety as a consequence. 


6.5.5 MD5 vs. SHA-256 


First of all, MD5 produces 128-bit hashes. In addition, SHA-256 is safer 
than MDS, especially when it comes of resistance to collisions. This 
means that applications using extensive security protocols shouldn't 
utilise MD5. The SHA-256, on the other hand, is used for high-security 
operations like SSL handshakes or digital signatures. Furthermore, fewer 
vulnerabilities against SHA-256 than MD5 have been documented. A 
normal computer is assumed to be capable to attack the MD5 because to 
its poor cryptography. 

Speed-wise, MD5 is a little bit quicker than SHA-256. As a result, the 
MD5 checksum is commonly used to confirm the integrity of data. In sum- 
mary, SHA-256 often performs better than MD5. It is more stable, reliable, 
and less prone to break. It is not really relevant that SHA-256 is a tiny bit 
slower than MD5 unless speed becomes the main factor. The longer hash 
causes the algorithm to run more slowly. As a result, SHA-256 achieves the 
best balance between security and speed. As a result, SHA-256 typically 
outperforms MD5, particularly when it comes to safety. On the other hand, 
systems where speed is the most essential element and where a high degree 
of safety is not required can use MD5. The SHA-256 algorithm is not the 
fastest at all. When hashing short strings, SHA-256 is approximately 30% 
faster than SHA-512. For each case, three measurements were taken, and 
an average value was determined. The time is in milliseconds per 1 000 
000 measures. The equipment makes use of a personal computer (PC) hav- 
ing a 64-bit Windows 10 operating system, with a single Intel i7 2.60GHz 
core and 16GB of RAM. The UUID means universally unique identifier. 
A UUID is a value of 128 bits. The results are shown in Table 6.1 below. 
Figure 6.6 hashing time in ms vs. file size in character 
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Table 6.1 Time to encrypt (in milliseconds) for file sizes in 
character. 


SHA-256 hash 
Data to encode MDS5 hash average 1m 
in length average 1m (ms) (ms) 
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Figure 6.6 Hashing time in ms vs. file size in character. 


Timestamp: It is essential to note the moment the block was generated. 
Tracking the creation or modification time of a document using time- 
stamping is safe. Because it enables the parties to ascertain the source and 
accessibility of a document at a given moment and date, this approach is 
quickly becoming a crucial instrument in business. 

Additional Information such as nonce ,digital signatures, and a few 
user-defined values are examples of other data. Each user has two keys—a 
private and a public key. These two keys are needed to create a digital sig- 
nature that is used for both signing and verifying. The data is encrypted 
using the private key, which is kept confidential and utilized to approve a 
transaction through signature. The public key is used to authenticate and 
decrypt data during the transaction verification phase, hence ensuring 
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data validity. The public key is known to everyone. A 4-byte number that 
starts with 0 and rises every time a hash computation is made is known as 
a nonce value. The goal threshold value of a valid block hash is determined 
by the nbits value, in accordance with [27, 28]. 


6.6 Blockchain Technology Applications 


6.6.1 Blockchain Technology in the Healthcare Industry 


Patients are hesitant to discuss medical specifics with strangers in today’s 
society. In this situation, the patient can employ technology to shield all 
information from prying eyes. A smartphone app or a website can be used 
to access this blockchain. Health records, papers, and photographs are pri- 
marily found in healthcare blocks. [29] discusses the healthcare blockchain 
in great detail and shows how the data has an impact on both storage and 
throughput. If data were stored on bitcoin-inspired blockchains, each user 
would have a copy of every user’s health record. Blockchain technology 
enables users to access a single data source to obtain quick, accurate, and 
comprehensive healthcare data. It should come as no surprise that safe- 
guarding our sensitive medical information is a top priority. 


6.6.2 Stock Market Uses of Blockchain Technology 


Blockchain technology reduces the expenses associated with exchanging 
assets, increases access to global markets, and reduces volatility in the tra- 
ditional securities market by cutting out the middlemen in the transfer of 
property rights. It’s crucial to keep track of who owns what when people 
purchase or exchange resources like stocks, mortgages, or commodities. 
In today’s capital markets, there are brokers, exchanges, central security 
deposits, intermediaries, and banks. Through automation and decen- 
tralisation, blockchain can make stock exchanges much more efficient. 
Blockchain technology has the potential to solve interoperability, trust, 
and transparency issues in fragmented market systems. Stock market par- 
ticipants like traders, brokers, regulators, and stock exchanges must go 
through a time-consuming process. 

These parties are built on an antiquated, lax, yet ineffective paper own- 
ership system [30]. Blockchain can significantly increase the efficiency 
of stock exchanges through automation and decentralization. To a large 
extent, blockchain can eliminate the need for third-party regulators since 
the rules and regulations are built into smart contracts and enforced with 
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each trade to register transactions, with the blockchain network acting 
as a regulator for all transactions. Using blockchain and smart contracts 
in post-trade activities can eliminate the need for intermediaries because 
peers rather than an intermediary handle transaction confirmations. The 
expenses associated with intermediaries, such as those for trade record 
keeping, audits, and trade verifications, increase as the number of interme- 
diaries in the system reduces. 


6.6.3 Financial Exchanges in Blockchain Technology 


Decentralized bitcoin exchange providers have multiplied in recent years. 
Blockchain exchanges allow for quicker and more affordable transac- 
tions. Investors also have more control and security because a decen- 
tralised exchange does not require them to deposit their assets with a 
centralised authority. Although cryptocurrencies are the main focus of 
blockchain-based exchanges, the idea could also be applied to more con- 
ventional investments. In addition, since a decentralised exchange does 
not require investors to deposit their funds with a centralised authority, 
they have more control and security. Although cryptocurrencies are the 
main focus of blockchain-based exchanges, the idea could also be applied 
to more conventional investments. 


6.6.4 Blockchain in Real Estate 


Real estate transactions require a tonne of paperwork to transfer deeds and 
titles to new owners, verify financial information, and verify ownership. 
Historically, real estate technology has been focused on listing. Blockchain 
can help to simplify and secure the process of buying and selling proper- 
ties. For purchasers, this means being able to research the ownership and 
history of a property. It entails being able to give sellers greater informa- 
tion about the sale process. The use of blockchain technology to record 
real estate transactions may offer a more secure and convenient way of 
verifying and transferring ownership. Blockchain introduces new ways to 
trade real estate and can enable trading platforms and online marketplaces 
to more fully support real estate transactions. This may facilitate transac- 
tional speed, reduce paperwork, and save money. Blockchain enables the 
speedier and more efficient completion of real estate transactions. This is 
due to the fact that blockchain enables digital asset transfers, doing away 
with the need for paper contracts or other physical documentation. 
Additionally, blockchain facilitates secure data sharing, makes it easier 
to collect and pay rent to property owners, and offers superior due diligence 
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across the portfolio. This increases operational efficiency while saving time 
and money, and it also generates a lot more data to help with decision- 
making. Another benefit of blockchain is its high level of security. Since 
every transaction is recorded on a distributed ledger and is unchangeable, 
it gives buyers and sellers peace of mind that the process is safe and secure. 


6.6.5 Blockchain in Government 


The hyper-connectivity that can now be seen in the world around us has 
resulted not only in more data but also in a significant shift in how the econ- 
omy is operating and interacting. The government must transform itself 
so that is truly centred on its citizens by becoming more open, effective, 
cost-conscious, and real-time in this era of constant change and economic 
change. And in order to meet this newly discovered demand, a change that 
would rock a bureaucratic government agency is required. The introduc- 
tion of a secure blockchain architecture and other aspects of this technol- 
ogy must be encouraged for this shift to occur. There are many advantages 
of a decentralised government centre; it makes government entities more 
effective, both in terms of how they operate and in terms of how well-liked 
they are by the general public. Another application for blockchain-stored 
digital identities is the administration of government benefits such as 
welfare programmes, Social Security, and Medicare. Using blockchain 
technology could reduce fraud and operational costs. Meanwhile, benefi- 
ciaries can receive funds more quickly thanks to blockchain-based digital 
disbursement. 

And blockchain offers a solution to this issue with all of its unique fea- 
tures. By enabling users to access and validate data, transparency, the key 
component of blockchain applications, chnges public attitudes to govern- 
ment. By enabling citizens to independently verify the claims made by the 
government, blockchain solutions accelerate the entire problem-solving 
process. When used properly, blockchain technology can reduce costs 
while also preventing duplication of effort, streamlining workflows, boost- 
ing security, lessening the load of audits, and even ensuring that data integ- 
rity is preserved. Many government operations, like taxation and voting, 
may be streamlined with the improved openness and security that block- 
chain technology can offer. The increased transparency and security that 
blockchain technology can provide may allow for the simplification of 
many government processes, including taxation and voting. 
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6.6.6 Other Opportunities in the Industry 


Blockchain can provide access to the banking and payment sectors for 
billions of users worldwide, including those in third-world countries that 
do not have access to traditional banking. Blockchain technology is being 
used by several financial organisations to streamline, improve, and safe- 
guard their processes. Blockchain investments are rising in initiatives and 
enterprises related to banking and payments. 


6.7 Difficulties 


Future prospects for blockchain technology include both benefits and diffi- 
culties. Although considerable, the difficulties may be solved as technology 
develops and advances. 

Scalability and Security - The blockchain is becoming bigger and bigger 
as more people use it and there are more transactions taking place every 
day. In open networks like public blockchains, this is a constant issue. 
Privacy is reduced in decentralized systems that replicate data throughout 
their network. Although there are numerous difficulties, the integrity of 
blockchains is their main strength. 


6.8 Conclusion 


Blockchain has been a fascinating topic of late, and it will support a wide 
range of uses. Blockchain will give a greater security during any valued 
financial transaction. This technology is mainly envisioned to handle 
bitcoin transactions. Blockchain applications comprise smart contracts, 
Ethereum, and distributed ledgers. This also increases security. The most 
appropriate and widely used blockchain application is bitcoin. Their trans- 
actions are faster and more economic than any other application. It can 
improve safety measures, especially for sensitive data. Blockchain applica- 
tions frequently profit from its transparency and immutability. 
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Abstract 

Artificial Intelligence incorporates human intelligence into machines and builds 
statistical models for providing solutions in various domains like cybersecurity. 
Though the AI-based techniques for detecting cyberattacks and threats are more 
efficient than traditional cybersecurity techniques, they lack explainability. These 
methods provide black-box solutions that are not transparent and interpretable in 
understanding the steps involved in reaching specific predictions. This reduces the 
confidence of users in the models used for cybersecurity, particularly in the pres- 
ent scenarios where cyberattacks are becoming increasingly diverse and compli- 
cated. To overcome this, Explainable Artificial Intelligence (XAI) is applied, which 
eventually replaces the traditional artificial, machine learning, and deep learning 
algorithms that operate as a black box. Given that cyberattacks are increasing day 
by day and the traditional AI algorithms are not sufficient for providing security, 
it is essential to focus more on XAI for exploiting the AI algorithms. This chapter 
provides a detailed discussion about explainable artificial intelligence and how it 
is applied in providing cybersecurity. 
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7.1 Introduction 


In recent days, more and more network attacks and mechanisms to steal 
data, damage reputations, hinder work and gain material advantage are 
being witnessed. Cybersecurity is the practice of defending and restoring 
networks and information systems from violations of fundamental secu- 
rity requirements such as data confidentiality, integrity, availability, and 
authenticity [9]. As the Internet becomes an indispensable tool in day-to- 
day life, the number of networked systems continues to grow. Also, the 
advances in computer networks and mobile devices have considerably 
increased Internet usage. This widespread use of the Internet has also lured 
cyber attackers to build more refined and potent cyberattack methods to 
their advantage. Various tools and techniques related to cybersecurity [6, 
7] are designed to combat threats that target the networked systems and 
applications present in any organization. 

To promise the confidentiality, accessibility and integrity of informa- 
tion transmitted on the Internet, a well-known security system must be 
established. Such systems are implemented to prevent financial extortion 
by users or reputable organizations that hinder normal business opera- 
tions. Therefore, it is absolutely necessary to adopt intelligent, effective 
and efficient countermeasures. In addition, traditional cyber defense 
mechanisms are challenged by the ever-increasing amount of informa- 
tion circulating on the Internet [14]. Cyber hackers, on the other hand, 
have been striving to stay ahead of law enforcement by developing new, 
intelligent and sophisticated attack techniques and implementing tech- 
nological advancements including AI [37]. As a result, cybersecurity 
researchers have started to explore Al-based approaches to improve 
performance. 


7.1.1 Use of AI in Cybersecurity 


AI techniques have delivered impressive performance on benchmark data- 
sets in a range of cybersecurity applications like intrusion detection, fraud 
detection, malicious application identification, etc. [8]. Recently, machine 
learning-based systems outperform humans in multiple domains, includ- 
ing defending cyberspace. Machine learning algorithms are used in detect- 
ing anomalies and threats related to security threats and vulnerabilities 
[4]. Modern information defense systems and cyber systems integrate ML 
methods for detecting attacks and preventing negative costs. 
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7.1.2 Limitations of AI 


Generally, Al-based security systems produce false-negative and 
false-positive results. The former can lead to wrong decisions, while the 
latter can lead to false alarms [24]. To deal with these situations, certain 
improvements should be made to keep the decisions explainable and rea- 
sonable [12]. If the system fails to understand and learn from cyberse- 
curity attacks, then cybersecurity becomes a black box with pervasive 
quadratic risk, irrespective of how powerful and accurate the Al-based 
system is [36]. This growing adoption of intelligent black-box systems in 
high-risk environments is severely hampered by the need for transpar- 
ency. This becomes a bigger problem when the machine learning models 
become more complex. The interpretability of machine learning models 
is critical for data scientists, researchers, and developers to understand 
the models, their value, and the accuracy of their results [26]. Thus, inter- 
pretation is required to investigate false positives, identify systematic 
deviations and errors, and ultimately make informed decisions for future 
improvements. 


7.1.3 Motivation to Integrate XAI to Cybersecurity 


In the above-mentioned limitations of Al-based approaches, the nature of 
the black box is a serious issue. This nature of AI makes the decisions of the 
cybersecurity systems too complex for people to realize how the output is 
generated. Hence, to trust the decisions of cybersecurity systems, AI must 
be transparent and explainable. To meet this type of requirement, several 
strategies have been proposed to make AI decisions more understand- 
able by humans. This explainable technology is referred as Explainable 
Artificial Intelligence (XAI). XAI works by making the results produced 
by Al-based statistical models interpretable and enabling researchers and 
experts to understand causal reasoning and primary data evidence [20]. 
XAI provides the experts with the logical understandability of the data and 
the results obtained. The main motivation in integrating XAI to cyberse- 
curity is to develop trust and to improve transparency, understandability 
and justifiability. 

In healthcare, the implementation of XAI enables the machines to ana- 
lyze data and come up with meaningful results. Second, it allows physi- 
cians to obtain decision-line information that explains how a particular 
decision was made. 
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7.1.4 Contributions 


This book chapter rationalizes the motivation for integrating XAI in 
Al-based cybersecurity models and provides a comprehensive review of 
state-of-the-art XAI applications in the cybersecurity area. This chapter 
extensively discusses the following topics: 


i. Various forms of cyberattacks 

ii, XAI and its categorization 

iii. Frameworks for the XAI-based cyber defense mechanism 
iv. Applications of XAI in cybersecurity 

v. Challenges of XAI applications in cybersecurity 

vi. Future research directions 


7.2 Cyberattacks 


Cyberattacks have become more sophisticated as our society has evolved 
and become more interconnected. As data breaches become more com- 
mon, it is critical to have a thorough understanding of modern cyberat- 
tacks. Zhang et al. [37] have clearly elaborated on the various cyberattacks. 
The XAJI-based defensive solutions for various types of cyberattacks 
are discussed in this subsection. Figure 7.1 depicts the various forms of 
cyberattacks. 
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Figure 7.1 Various forms of cyberattack. 
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7.2.1 Phishing Attack 


When a malevolent actor sends messages posing as a reliable source, it is 
called a phishing attack. The intention is to steal private data from the vic- 
tim’s computer, including login and credit card information. Phishing is an 
online fraud technique that is gaining recognition and takes many forms. 


7.2.1.1 Spear Phishing 


A scam known as “spear phishing” involves an email or other electronic 
contact in which the attacker tries to take money from a person, a com- 
pany, or an organization. Attackers install malicious software on the vic- 
tim’s PC in order to access the victim's data. 


7.2.1.2 Whaling 


Whaling is the term used to describe a cyberattack that targets the CEO 
or CFO of a target organization. Attackers directly target senior or other 
important people by impersonating senior or other essential members of 
an organization. It aims to filch the money or secret data for gaining access 
to the computer for committing cybercrimes. 


7.2.1.3. Smishing 


Phishing with text messages is referred to as “SMS phishing” or “smishing”” 
Text messages that seem to be from a reliable source are constantly sent to 
people who have been the target of a smishing attack. One sort of phishing 
attempt that occurs over the phone is voice phishing. Phishing scammers 
are increasingly using Voice over IP technology to communicate with their 
victims. A smishing effort posing as the US Postal Service was discovered 
by Tripwire. Malicious SMS message senders directed their victims to click 
on a link to learn more about an upcoming US Postal Service delivery. The 
malicious link took users to a number of websites designed with the sole 
purpose of stealing their Google account information. 


7.2.1.4 Pharming 


Using the social engineering technique known as “pharming,” attackers 
redirect website users who are trying to access a specific website to a pho- 
ney one. Through unlawful websites, malware can be placed on the victim’s 
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computer to steal login credentials and personal information. Often, 
cybercriminals target financial organizations. 


7.2.2 Man-in-the-Middle (MITM) Attack 


This attack happens when a person intrudes in a conversation and imper- 
sonates one or the other, making it resemble a normal data exchange. 


7.2.2.1 ARP Spoofing 


ARP spoofing, sometimes referred to as ARP poisoning, is a type of MITM 
attack that enables attackers to eavesdrop on network device communica- 
tion. By using the weaknesses in the protocol, it has the potential to poison 
the MAC of other devices to IP mappings using ARP. Using easily acces- 
sible tools, a malicious attacker can contaminate the ARP caches of other 
computers on a local network, filling them with false information. 


7.2.2.2 DNS Spoofing 


DNS cache poisoning or DNS spoofing is an extremely cunning cyberat- 
tack that involves poisoning the DNS cache in order to route web traffic to 
phishing websites. By constructing false websites that look like the user’s 
intended destination, hackers can easily trick people into providing per- 
sonal information. 


7.2.2.3 HTTPS Spoofing 


In HTTPS spoofing, the URL of the HTTP site of the attacker significantly 
differs from the URL of a genuine, legitimate site. By establishing a slightly 
different URL that resembles the user’s intended URL but is actually slightly 
different, hackers are able to obtain personal information. 


7.2.2.4 Wi-Fi Eavesdropping 


It is the practice of hackers eavesdropping on wireless communications on 
unprotected networks or setting up networks with catchy names to lure 
users into connecting so they may steal the login information they send 
over that network. 
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7.2.2.5 Session Hijacking 


This occurs when a person logs into a web page and the attacker waits for 
them to do so, then takes their session cookie and uses it to enter the same 
account from his browser. This attack is also known as a cookie session. 


7.2.3 Malware Attack 


Malware is a term used to describe malicious software such as ransomware, 
spyware, viruses, and worms. When a person accidentally clicks on a mali- 
cious link or opens an attachment of email, the machine may be infected 
with malware. It can restrict access to critical network resources and has 
the ability to gather and transmit sensitive information without the user’s 
knowledge. Malware has the potential to disable a large number of system 
components, rendering the machine completely inoperable [33]. 


7.2.3.1 Ransomware 


Ransomware is a form of malware that requires payment and is regarded 
as one of the most dangerous varieties. Data is encrypted, and decrypting 
it costs money. People unwittingly infect their computers with this type of 
virus through email attachments or links from dubious websites, which is 
one of the most common causes of infection. If ransomware is installed, 
it might provide hackers access to a device's backdoor, enabling them to 
encrypt the data on the target device and prevent its owner from decrypt- 
ing it until they are paid a ransom. Because it demands ransom payments 
in digital currency, it is also known as crypto-malware. In conclusion, ran- 
somware can commandeer machines, encrypt data, and ruin the victim's 
finances [31]. 


7.2.3.2 Spyware 


Malware that accesses computers without the owner’s consent includes 
spyware. This is typically done with the intention of collecting user cre- 
dentials, spying on Internet activity, or acquiring private information 
that could be used fraudulently. The term “spyware” covers a wide range 
of undesirable programmes, including adware, Trojan malware, and even 
cookie trackers. The term “keylogger” refers to a sort of espionage software 
that is among the most often used. It records each keystroke performed on 
the keyboard and stores the data a person enters. In conclusion, spyware 
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may invade users’ privacy, gather sensitive information, steal data, or steal 
their identity. 


7.2.3.3 Botnet 


A “spider” programme known as a botnet scans the internet for security 
holes and stops exploitation. In this attack, the intrusion happens auto- 
matically. By inserting malicious malware into devices, it causes infection. 
They can be used to successfully hack into devices. They have the ability to 
perform distributed denial-of-service (DDoS) assaults and capture activi- 
ties like keystrokes, camera images, or screenshots. Hackers utilize a botnet 
to gain remote access to a computer. 


7.2.3.4 Fileless Malware 


The software, programmes, and protocols that are inherent to or built into 
an operating system of a device are used to install and run fileless mal- 
ware. This is memory-based and does not require downloading. It keeps 
wreaking havoc as long as legitimate programmes are running. Due to its 
stealthiness, it is hard to detect. Therefore, fileless malware has the ability 
to interfere with antivirus programmes and steal data. 


7.2.4 Denial-of-Service Attack 


DoS assaults are venomous, targeted offences that can flood a network with 
fraudulent requests in an effort to sabotage corporate operations. In this 
case, users are barred from resources that are kept on a system or in a net- 
work. They don’t cause data loss. 


7.2.5 Zero-Day Exploit 


Zero-day security flaws are those that can be exploited by hackers. Zero- 
day vulnerabilities are those that have just recently been found by a user 
or developer. A zero-day attack happens when hackers take advantage of 
vulnerability before engineers can fix it. 


7.2.6 SQL Injection 


Malicious SQL code is used to manipulate databases on the back-end and 
gain access to data that is not intended for display. SQL injection-SQLI is 
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the name of this harmful code injection method. Details like private cus- 
tomer information or critical business information may be jeopardized. 


7.3 XAlI and Its Categorization 


Van et al. [35] coined the term XAI to depict the ability of the technology 
to interpret the behavior of AI-driven entities in gaming applications. The 
goal of XAI is to make it easier for end users to understand the results of 
Al. According to DARPA, XAI’s goal is to generate more explainable mod- 
els and enable stakeholders to better understand and appropriately trust a 
new generation of AI partners [31]. 

The research in AI has shifted to building models and algorithms that 
emphasize predictive power. Researchers and practitioners have recently 
started paying attention to XAI. XAI is a set of techniques and methods 
that help researchers understand and rely on the results and conclusions of 
machine learning models. XAI helps researchers understand the accuracy, 
rationality, transparency, and effectiveness of Al-assisted decision-making 
by comparing it to other decisions. The output needs to be interpretable in 
order to be credibly adjusted. 

The terms explainability, interpretability, transparency and intelligibility 
are used to characterize X AI; the relation between these terms is depicted 
in Figure 7.2. 


Transparency 


Figure 7.2 Relation between the terms of XAI. 
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Figure 7.3 XAI categorizations. 


There are numerous ways to structure XAI categories. It is clear that the 
some of the categorization techniques may overlap and that a particular 
XAI methodology may fall under one or more categories. As a result, XAI 
is categorized using many categorization perspectives as shown in Figure 
7.3. This gives additional details and XAI approach traits at various levels. 


7.3.1 Intrinsic or Post-Hoc 


This categorization approach differentiates whether explainability is 
accomplished by restricting the complexity of the AI model (intrinsic) 
or by examining the methodology of the model after training (post- 
hoc). Using the data generated by the prediction model, an intrinsic 
XAI technique generates the explanation along with the prediction. 
Due to their inherent self-explanatory nature, some ML models, such 
as Decision Trees and Sparse Linear models, are recognized as intrin- 
sic XAI techniques. Post-hoc explanations, on the other hand, include 
using interpretation techniques after the models have been trained and 
the judgments have been made. Typical post-hoc explanation techniques 
working independently as an external interpretable model include LIME 
and Permutation Importance. 
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7.3.2 Model-Specific or Model-Agnostic 


XAI methods are categorized based on the models to which XAI is 
applied—model-specific or model-agnostic. Model-specific tools are 
unique to a particular model or set of models. For example, graph neural 
network explainer is a technique for delivering explanation for graph-based 
ML problems that are based on GNN. On the other hand, model-agnostic 
explanation tools can be used with any ML model. Additionally, model- 
agnostic explanation techniques often analyse feature inputs and outputs 
rather than the internal data of the models, such as weights or structural 
information. Tools for model-independent explanations include Grad- 
CAM, Saliency Map, and SHAP tools. 


7.3.3 Local or Global 


Depending on the scope of the decision model, explanations can be local 
or global. The ability of a system to explain to a user why a particular 
option or decision was taken is known as local explainability. This group 
includes certain well-liked explainability techniques including LIME [25], 
SHAP [19], and counterfactual justifications. Global explainability, in con- 
trast, relates to the explanation of the learning algorithm as a whole, taking 
into account the training data used, the algorithms’ suitable uses, and any 
warnings indicating the algorithm's shortcomings and improper applica- 
tions. GAM is proposed as a method of global explanation to explain the 
distribution of neural network predictions across subpopulations. 


7.3.4 Explanation Output 


The format of the explanation output would have a significant impact on 
some users, making it another essential element of XAI categorization. For 
instance, text-based explanation techniques are frequently used to fine- 
grained information and produce comprehensible explanations in Natural 
Language Processing (NLP). On the other hand, the techniques to visual- 
ize explanation are employed in a wider range of fields, such as neural net- 
works, NLP and healthcare. In reality, most feature summary statistics can 
also be visualized and some feature summaries can only be understood by 
visualization. In order to assist people to better understand the relevance 
of a feature, argument-based explanations require describing the features 
in a style that people use to make judgments. Approaches to model-based 
explanation must describe the internal working logic of a black-box model. 
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7.4 XAI Framework 


As stated earlier, all the machine learning algorithms have been a black box 
to the users, not letting the user know what processes are carried out for a 
particular input and how exactly an output for a specific input is obtained. 
Understanding the chassis and functioning of machine learning algorithms 
is most important for it to be used especially in security fields. The frame- 
works of Explainable artificial intelligence help us to create a consecution 
of machine learning techniques through which more explainable models 
with higher performance that enable the users to understand the underly- 
ing concepts and techniques of the black box are obtained [29]. Figure 7.4 
summarizes a few of the most well-known XAI frameworks, which are also 
detailed in this section. 


7.4.1 SHAP (SHAPley Additive Explanations) 
and SHAPIley Values 


SHAP [19] is the most famous visualization tool that aids in providing 
detailed explanation of prediction models with which the contribution of 
each predictor to the final output can be examined. It can be used to sim- 
ple ML algorithms like linear regression, logistic regression, decision tress 
and also more complex deep learning architectures that are used for image 
classification, natural language processing, etc. Many variants of SHAP are 
available. Figure 7.5 shows some of the evolution of SHAP models that 
have been designed based on their performance on different machine 
learning algorithms. 

The most important component in SHAP is the SHAPley values [13]. 
The basic idea of SHAPley value concept has been inspired from game 


XAI Framework 
SHAP (Shapley additive Explanations) and SHAPIley values. 
=a _] 


_ ALE-Accumulated Local effects 7 


Figure 7.4 XAI framework. 
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theory, where each player’s contribution towards the end result of the game 
is computed. Similarly in the SHAP architecture, SHAPley values will let 
us know how accurately the contribution of different features are distrib- 
uted among the available features in a prediction model. The working is 
based on the assumption that each feature in the model works together 
with other features in bringing up the output. SHAPley value is mathemat- 
ically defined as follows: 


ISI(KI-| 
o,= ) Pie SRS (sti f(s)) (7.1) 


SCK\{i} 


where K=Set of all features, 
S=Subset of features the model uses, 
f(S)=produces predicted output for any features 


7.4.1.1 Computing SHAPley Values 


A brief description on the working of SHAP [20] for understanding the 
underlying black box algorithm is listed below. 


1. From the given data, set of all possible feature combinations 
S are selected. They are called coalitions. 

2. Average model prediction is computed. 

3. Calculate the variation between the average prediction and 
the prediction made by the model without feature i for each 
coalition. 

4. Calculate the variation between the average prediction and 
the prediction made by the model with feature i for each 
coalition. 

5. Compute the difference for the values obtained in step 3 and 
step 4. This will be the marginal contribution of a feature i. 

6. Average of all the values computed in step 5 gives the 
SHA Pley values. 


Figure 7.5 illustrates the evolution of SHAP models. Some of the signifi- 
cant flaws in SHAPley values-based architectures is that these models pro- 
vide additive contribution to the explanatory variables. Thus if the model to 
be explored is of non-additive nature, then the SHAPley values may be mis- 
leading. Also the computation of SHAPley values is quite time consuming. 
However, subsampling methods can be used to address these issues. 
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Figure 7.5 Evolution of SHAP models. 


7.4.2 LIME - Local Interpretable Model Agnostic Explanations 


As the name suggests, LIME [22, 34] has a property of model agnosti- 
cism, with which it treats any supervised model as a separate black box 
and provides explanation for it. And the explanation provided are Local 
explanations, i.e., explanations are provided using the samples that are in 
the vicinity of the observation/sample that is being explained. Working 
principle of LIME is almost similar to that of SHAP, with the major dif- 
ference being the execution time. LIME is best suited but not limited to 
predicting tabular data, image and test classifiers. They are considered as a 
concrete implementation of intrinsic interpretable models that are trained 
to approximate the underlying black box model’s predictions. 

Given a sample test and a prediction model, the two main steps LIME 
does are sampling the given data to get a surrogate dataset and selecting 
features from the surrogate dataset that has been created in the previous 
step. The weights of each row of the surrogate dataset are then determined 
by computing how closely they resemble the original data. 

Mathematically, LIME is described as: 


Detail (x) = arg min{L(c,d,T1,.)+ Q(d)} (7.2) 


deD 


Where d= explanation model for a particular instance (x) 

D= All possible explanations 

L= Loss function computed between explanation and prediction 
c= actual black box model 


{IL} = Proximity measure, 
{O,(d)} = Omega complexity of the explanation model. 
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7.4.2.1 Working of LIME 


Below is a basic explanation of how LIME operates so that one may com- 
prehend the algorithm’s black box foundation. 


1. Fora given set of data, discompose the data n times to create 
replicated feature data. This discomposed data will be with 
slight modification of the actual data. This data will be used 
by LIME to create a local linear model. 

2. For the discomposed data, predict the outcome. 

3. Compute the distance between each discomposed data and 
the actual data. 

4. Distance computed is converted into a similarity score. 

5. From the discomposed data, some features which best 
describe the predictors are selected. 

6. Fit a simple basic model to the discomposed data for the 
selected features. 

7. Coefficients of this simple model will be the explanations of 
the observations. 


LIME in general has various proposals for dealing with explanatory 
variables and hence this leads to different implementations of LIME, and 
as a result, different results are possible [25]. Also sometimes the model 
may be misleading, failing to control the quality of local outfit to the data. 

Another important issue to be addressed in LIME is that, with higher 
dimensional data, most of the data are sparse. It could be difficult to 
precisely define the “local neighborhood” of the relevant occurrence. 
Even a slight change in the neighborhood will strongly affect the 
explanations. 


7.4.3 ELI5 


ELI5 [18] is the acronym of Explain like I’m 5. It is a very famous Python 
package that is used to understand many ML algorithms. They are mostly 
used in sklearn regression and classification problems, Keras, CatBoost, 
etc. Various inbuilt functions can be used to get the details on how a par- 
ticular decision is made in any classification or regression problems. ELI5 
computes the weights for each feature and shows the contribution of each 
feature in predicting or classifying the output. 
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7.4.4 Skater 


Skater [11] is another open-source library of Python for learning about the 
black box of any learned model. The library is capable of producing both 
local inference, pertaining to individual features as well as global inference, 
pertaining to the entire dataset. 

There are many algorithms supported by Skater. Depending upon their 
scope of interpretation, algorithms are broadly classified as shown in 
Figure 7.6. 


7.4.5 DALEX 


Similar to the previous models, DALEX [2] is another Python library devel- 
oped for Explanatory Model Analysis and can be used for both classifica- 
tion and regression in ML. The DALEX package adds an abstraction layer 
to models, enabling interaction with various models in a consistent manner. 

DALEX comes with various packages with which a relationship between 
the model input and the model output can be easily obtained. 

One of the important packages in DALEX is DALEXTRA. This package 
consists of various tools which aid in inspecting and improving the mod- 
els. Two main functionalities DALEXTRA provides are listed and briefed 
below. 


1. Champion-Challenger analysis 
This functionality of DALEXTRA helps us to compare two 


or more machine-learning models, decide which is superior, 
and then enhance both of them. 


Scope of 
interpretation 
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Figure 7.6 Skater algorithms. 
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2. Cross-language comparison 
With this functionality, explainers can be created for models 
that were developed in different languages. 


7.5 Applications of XAI in Cybersecurity 


Machine learning prediction models are trained automatically without the 
knowledge of the domain in which they are applied. Hence, they are known 
as black boxes. This ambiguity in the model-building process creates many 
unpredictable risks. For example, model performance drops due to out- 
of-domain issues that lead to poor performance, data drift, or behavior 
learned from historical data was unfair. There are many situations arising 
in various domains where the black box fails which leads to the increased 
interest in XAI methods. 

As the frequency of devastating cyberattacks has increased, establish- 
ing, and enhancing cybersecurity is a massive social challenge. To address 
this social issue, a timely, most prominent and actionable intelligence [5] 
against the threats is developed to enable effective decisions against the 
cyberattacks. In this section, various applications of Explainable AI to pre- 
vent cyberattacks and techniques to provide cybersecurity are discussed. 
Some of the applications of XAI in cybersecurity are shown in Figure 7.7. 
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Figure 7.7 XAI in cybersecurity applications. 
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7.5.1 Smart Healthcare 


The research carried out in smart health using artificial intelligence and 
machine learning is to analyze the physiological data to identify the men- 
tal health of individuals. The AI technology Human-in-the-loop system 
enables us to identify mental illness and give proper treatment. Early 
detection of mental illness would substantially reduce the damage caused 
by it. There are many malicious threats that target an attack against these 
Human-in-the-loop systems. XAI could aid in providing solutions to the 
cybersecurity issues to develop robust and secure Human-in-the-loop sys- 
tems to handle mental illness. 

A key challenge for XAI systems is to maintain the integrity of the deci- 
sion-making process. To achieve this, the personnel have to take direct 
control of the process. The attacker can insert malicious input and train 
the model to provide an adversarial solution. XAI has provided efficient 
solutions to analyze and identify the attacks and provide cybersecurity 
solutions [32]. 


7.5.2 Smart Banking 


Financial cybercrimes have been growing in recent years. In order to for- 
tify cybersecurity in financial sectors, it is mandatory to reduce the risk 
score involved in digital transformation on cloud environments. This pro- 
vides the banking and various financial sectors an unparalleled agility and 
protection to reduce cyber risk. By integrating information security into 
the infrastructure, data and assets are protected. This helps to establish and 
stabilize financial regulations and compliance program activities. 

XAI cybersecurity solutions incorporate necessary policies and con- 
trols to decrease the risk involved in financial sectors. XAJ-oriented fraud 
detection and regulatory compliance tool provides layered cybersecurity to 
resolve the cyberattacks [1]. 


7.5.3 Smart Cities 


In the smart city domain, to enforce the desired service, an extended net- 
work of sensors is connected to extract data from various locations. Hence, 
the network infrastructure should consist of secure and reliable intercon- 
nected devices like actuators, sensors etc. The data from these devices are 
gathered, processed, and communicated to enable smart city services. The 
data from such a heterogeneous network has a great impact to enforce 
cybersecurity in the smart city implementation. Since these tiny low-end 
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devices have a limitation [28] in their processing and storage capacity, it 
may not be robust to establish security features like authentication tech- 
niques or cryptographic encodings. This leads to possible cyberattacks on 
smart city services. 

XAI provides solutions to prevent these attacks. It provides a transpar- 
ent authentication mechanism on the devices like generating one-time 
passwords. All activities are controlled by proper permission to enable 
the action for the activity. The various software and firmware updates are 
made automatic and periodical. Logs are maintained in an encrypted for- 
mat to prevent tampering [3]. The technology transfer is also made secure. 
Regular auditing and assessment, protection of access control and logging 
environment is taken care of to prevent potential cyber threats. 


7.5.4 Smart Agriculture 


The use of Internet of Things technologies, Informationand Communication 
Technology and data analysis plays a major role in the implementation of 
smart agriculture and its operations. The objective of establishing smart 
agriculture technologies is to meet the growing demands to monitor crops, 
check soil fertility, automate soil testing whenever needed, etc. Smart 
agriculture depends on the smart networks forming cyber-physical sys- 
tems [16]. These systems enable communication with other devices such 
as sensors, processing control units, etc. The entire setup is controlled by 
computer and communication systems. These devices collect data such as 
moisture, weather fluctuations, fertilization and so on. This data is further 
analyzed and processed to improve the productivity of agriculture. All 
these activities increase the cyber threats as the data is collected from var- 
ious sources [27]. The XAI provides cybersecurity solutions to handle the 
physical attacks, attacks against authentication, replay attacks and mali- 
cious code attacks. 


7.5.5 Transportation 


Intelligent transportation systems face a big challenge in working with 
increased value of ITS data and connectivity [23]. There are many cyber 
risks involved in data collection and processing of data. 

Some of the major impacts of cyberattacks in the transportation include: 


e The most important data files and information are blocked 
¢ ‘Traffic lights, toll booths are disrupted 
e Payroll services are interrupted 
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e Ticket machines and fair gates are interrupted 
e Sensitive information from emails is stolen 
e Personal information is stolen 


The XAI-based ITS architecture employs risk-reducing components 
that are extensible, interoperable, and can operate in degraded conditions 
in times of cyberattacks. 


7.5.6 Governance 


Governance is a set of wise decisions made by organizations to enable 
cybersecurity activities. The tenets of a cybersecurity programme offers 
a comprehensive picture of risk and active tracking of performance [10]. 
Organizations can leverage the advantages of functioning in the digital 
market with the help of strictly delineated cybersecurity governance. The 
effectiveness and durability of the digital business transformation are sup- 
ported by effective cybersecurity governance. Without sound cybersecu- 
rity governance, companies would struggle to keep the trust of external 
stakeholders or guarantee core business sustainability [21]. Implementing 
effective XAI for cybersecurity in governance would enable the following: 


e A cybersecurity vision drives overall organizational 
decision-making in accordance with the overall plan. 

e Supervision and resource allocation via a platform and 
training for cybersecurity. 

e Acomprehensive approach to risk assessment that takes into 
account cybersecurity risk and improves knowledge of the 
organization's exposure to cyber threats. 

¢ Duties and responsibilities for cybersecurity that is clearly 
defined, allocated, and incorporated into the enterprise. 

e A reliable mechanism for monitoring the implementation 
and commenting on progress. 


7.5.7 Industry 4.0 


Interconnected and smart technologies are a part of work in organiza- 
tions, and even wearable devices are being used by humans. This shows 
the benefit of emerging technologies from artificial intelligence (AI), 
machine learning (ML) and robotics to quantum computing. Even the 
Internet of Things (IoT) and additive manufacturing benefited from the 
smart technologies [17]. As the Industry 4.0 is becoming smarter day 
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by day by means of interconnected devices, the cyber threats are even 
increased at the other end [15]. 
A few of the cyber risks in the Industry 4.0 are listed below: 


¢ Critical application and infrastructure are accessed and con- 
trolled by attackers by evading the detection caused by state 
adversaries. 

e New sophisticated approaches by the attackers intensify the 
malicious attacks on the critical cloud infrastructure. 

e ‘The latest vulnerabilities are adapted by the targeted and 
crime actors that will exploit the trusted sources and supply 
chain. 

e Stolen identities and credentials are exploited by sophisti- 
cated adversaries. This leads to the ransomware hunting 
attack. 


To handle all these cyber threats, XAI applications enable us to define 
the solution based on the particular attack rather than a common solution 
for all the attacks. 


7.5.8 5G and Beyond Technologies 


Attacks by nation-state hackers are becoming more of a threat, espe- 
cially for telecom companies, which were lately witnessed. Additionally, 
cutting-edge technologies like 5G pose new threats and vulnerabilities. 
However, it is crucial to note that 5G also offers significant advantages, 
particularly enhanced security features like improved authentication and 
encryption features [30]. The detailed requirements are currently being 
created, and much would depend on the way they will be implemented 
into products and used by operators to handle cyberattacks. In spite of its 
own secured infrastructure, there are cyberattacks in 5G technologies. The 
most prominent problem-oriented solutions are modeled using XAI on 5G 
cyberattacks. 


7.6 Challenges of XAI Applications in Cybersecurity 


A review on the XAI techniques has been done for the different attacks that 
have been happening and the cybersecurity domains. XAI is a powerful 
tool but there are also a couple of challenges [37] that they face which are 
discussed in detail. 
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7.6.1 Datasets 


The main problem with datasets is that they are not updated in certain 
directions. This can occur due to confidentiality and ethical issues. If 
the cyberattacks that happened recently are integrated in the establish- 
ment of cyberattack defense mechanisms it leads to ineffectiveness of 
the XAI application, which is why they are not included in the datasets. 
Since cyberattacks are becoming more complicated, the datasets have to 
be updated well. The other problem that the datasets face is that there is 
a deficit in voluminous amount data that is needed for the XAI training 
methods which will result in decreased performance and explainability of 
the XAI approaches. The information pertaining to cyberattacks and cyber 
industries is redundant and not balanced. The challenge for the XAI mod- 
els is the heterogeneity present in the dataset. These problems give us an 
outlook on the present voluminous benchmark datasets used for training 
and testing. 


7.6.2 Evaluation 


Evaluation for the XAI systems plays an important role. The performance of 
the cybersecurity systems includes performance metrics such as Precision, 
F1-Score and ROC. XAI systems must be able to assess the quality, value, 
and satisfaction of explanations, etc. But the challenges faced by XAI sys- 
tems are more generic. The XAI explanation evaluation measurements are 
divided into two categories, namely, user satisfaction and computational 
measurements. User satisfaction-based evaluation causes privacy issues 
because they are independent on user feedback or interview. Inherently 
interpretable models are utilized by many researchers for computational 
measurements. They lack certain things in the other cybersecurity domains 
like computational resources and computational power. To provide future 
improvements for XAI applications it is required to consider a set of stan- 
dard evaluation metrics. 


7.6.3 Cyber Threats Faced by XAI Models 


The XAI models are encountering many cyberattacks targeting the vul- 
nerabilities of the explanation approaches, which makes it dangerous for 
the cybersecurity systems. For instance, most popular XAI explanation 
methods such as LIME and SHAP, deployed in the XAI application of 
cybersecurity, can also be fooled. The most defensive cyber approach is the 
security of the performance of the prediction results of the XAI models. 
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It is important to retain the transparency and efficiency of the entire sys- 
tem and also to prevent the cyberattacks. 


7.6.4 Privacy and Ethical Issues 


One of the crucial challenges faced by XAI models is to consider privacy 
concerns. Authentication, emails, and password are part of a person's right 
to their personal privacy. It is important to be cautious in ensuring that 
there is no discrimination, bias, or unfairness made by the XAI system and 
the explanations that go along with them. In the specific domain of cyber- 
security, XAI terms can be eliminated. The privacy and security-related 
concerns increase as the data is collected from various security-related 
sources and only authorized persons are provided access to XAI models. 


7.7. Future Research Directions 


One potential area for future research is the creation of both high-quality 
and updated datasets that can be used for XAI applications for cybersecu- 
rity. Research on the trade-off between performance and explainability of 
XAI techniques used in cybersecurity is lacking. Future research could 
focus on how to develop customer-centered XAI systems for cybersecurity 
to enhance customer understandability and performance without compro- 
mising security. Even though current studies on cyber threats and corre- 
sponding defensive mechanisms are focusing on the performance of AI 
models, the adversarial threats and defences against the explainability of 
XAI models still need to be explored. As privacy and ethical concerns have 
recently received attention, confidentiality and data protection are import- 
ant challenges in the field of cybersecurity. Future research may focus on 
the XAI systems’ generated explanations and data protection. 


7.8 Conclusion 


This chapter discusses the key insights regarding using XAI for cyberse- 
curity. With the use of ML models and the XAI framework, predictions 
may be understood and interpreted. An application of AI called cyber- 
security analyses datasets and keeps track of a variety of security vulner- 
abilities and fraudulent activity. The work that is being presented offers a 
cutting-edge analysis of XAI in cybersecurity. The concept of cybersecu- 
rity is introduced first, stressing the many forms of cyberattacks and their 
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effects. An XAI system boosts confidence in the XAI-based cybersecurity 
system by offering explanations. XAI explanations of how user data is used 
in algorithmic decision-making could teach end users. The visualization 
and explainability of the XAI system can help cybersecurity profession- 
als assess the reliability and uncertainty of models. This was followed by a 
thorough analysis of the most recent XAI study findings. 
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Abstract 

Threat detection is a way of analyzing the entire system to predict the malicious 
things that take place over the network. Generally, threat detection methods 
include configuration, indicator, modelling and threat behavior. Currently, a 
Phishing attack is one of the popular attacks that happen in the internet and it 
grows in an exponential manner. It steals highly sensitive information through a 
website that closely resembles an authorized website. Phishing includes not only 
sending emails and waiting for the reply but also includes taking the information 
that is bypassed through digital communication medium. Normally technophile 
will induce the user to collect the information from managerial assets and net- 
works. Recently, Artificial intelligence is the best way to analyze the vast amount 
of data. With the advent of artificial intelligence techniques, threat detection soft- 
ware behaves like a technophile which in turn helps to identify the cyber crimi- 
nals. Currently, several Deep Learning algorithm can be used to predict phishing 
websites and anomalous behavior. This work incorporates recurrent neural net- 
works combined with Adam optimizer to build a hybrid learning model to assess 
whether a website URL is good or bad. The proposed model outperforms the exist- 
ing various deep learning models with accuracy of 97%, precision of 97%, recall of 
98%. F1 Score is 97%. 
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8.1 Introduction 


The web is an important channel for organizing [1] business, accessing 
news, financial transactions, playing games, interacting with government 
bodies, entertainment, and other types of services. In this digital era, cyber- 
security has become a serious issue. A cyberattack is a process of stealing 
another person's information or organization in a malicious manner by an 
individual or organization. Malware can happen through accessing net- 
work key components, installing harmful software, spyware, information 
coming from a reputable organization, etc. Threat detection is a way of 
analyzing the entire system to predict the malicious things that take place 
over the network. Generally, threat detection methods include configura- 
tion, indicator, modelling, and threat behavior. Nowadays, phishing is one 
of the serious cyber threats that spread across the country. 


8.1.1 Phishing 


Phishing (Figure 8.1) is an attack which involves getting sensitive infor- 
mation of a target related to personal details, master card details, login 
information, banking, etc., via email, websites, social networks, or mes- 
sages. The main source of a phishing attack is through email and websites. 
Phishers can also introduce content into the targeted system, and they can 
modify the email address in a way that resembles an original email address. 
With the help of this information phishers can access an accounts section 


4a. Credentials 
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> IL tactersite 


Victim ID Theft 


Figure 8.1 Phishing attack. 
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and that results in monetary loss [2]. Apart from phishing, voice phishing 

is also available, and several types of phishing strategies are continuously 

developed by cybercriminals. Increased usage of social media affords a 

fertile field for phishing assaults because of growing sharing of personal 

details. Statistics say that 250 million apps are downloaded per day. 
Attacks are classified into three types [3]: 


e Attack initiation 
e Data collection 
e System penetration 


i. Attack initiation: It involves two categories, technical and 
behavioral attack. The first involves penetrating suspicious 
email into spoofed mail. The second attack will be focused 
on getting their sensitive information. 

ii Data collection: It is about collecting the information from 
targets that occurred during interaction with the materi- 
als of attack. This can be done automatically or in a man- 
ual manner. Automated data collection is mainly based 
on forged web forms, recorded messages, key loggers, 
recorded messages, event invitation, awarding rewards, 
and so on. 

iii. System penetration: It makes use of resources of the sys- 
tem to make the initiation of the phishing attack easier. 
Fast-flux and cross-site scripting are the two strategies for 
penetration. 


A phishing attack [4] happens mainly on a personal computer system 
due to the following five reasons: Clients don’t possess short data about 
Uniform Resource Locator (URLs); the specific thought regarding which 
pages can be relied upon; whole area of the page in light of the redirection 
or secret URLs; the URL has numerous potential choices; or a few pages are 
unintentionally entered and users can’t separate a phishing site page from 
the genuine pages. Figure 8.2 represents the statistics of a phishing attack 
from a 2021-2022 report (https://docs.apwg.org//reports/apwg_trends_ 
report_ql_2022.pdf). 

Timely and effective phishing detection of URLs is critical [5]. It success- 
fully safeguards the internet users from a phishing attack. On the client- 
side web browser, a blacklist is offered by certain services like Microsoft 
Smart Screen Filter and Google Safe Browsing, depending upon surveys 
collected by URLs directly and their corresponding pages. On the side of 
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Figure 8.2 Phishing attack survey from 2021-2022. 


servers, phishing detection renovate their phishing blacklist URLs based 
on experts of security scan, endorsements from advanced clients or expe- 
rienced researchers are gathered with the URLs and respective web pages. 
During detection of phishing, a false positive has the potential to oppose 
the user’s identity to visit a legitimate website whereas a false negative will 
make victim as user for the phishing attack. Based on blacklisting, detection 
of phishing can attain an approximately zero false positive rate. However, it 
is not possible to add new phishing URLs to the blacklist in timely fashion, 
and many false negatives can happen in repetition. In contrast, experience- 
based fraud identification can lead to some false positives, but it has the 
advantage of identifying real-time current phishing URL. 


8.1.2 Features 


e Eye-catching statements 
e Sense of extremity 

e Attachments 

e Hyperlinks 

e Unusual sender 

e Website replica 


Many researchers have proved that detecting a phishing attack with 
machine learning algorithm using heuristics approaches can attain 
higher accuracy with less false positive rates. They used two kinds of URL 
features—lexical and host features. Lexical features are features of the text, 
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such as word and n-gram statistics of the URL string itself, and host fea- 
tures like domain, registration, and URLs hosting server location features. 

To protect against this attack, an automated mechanism is required to 
detect this malicious content at an earlier stage before it gets to the user. 
One of the main advantages of using a deep learning model here is because 
there is no need to do feature extraction manually. Recurrent neural net- 
work (RNN) is used to predict the phishing websites. A neural network is a 
machine learning model comprising artificial neurons, and they are inter- 
connected. RNN is one of the artificial neural networks where it is suit- 
able to model sequential patterns. The unique characteristics of RNN is the 
time to the model and it helps to permit them to process data sequentially 
one at a time and grasp their sequential dependencies. An RNN model 
will consider input from the previous step outcome. This remembering of 
the previous step can be done with the help of a hidden layer in recurrent 
neural network model, and it has a memory which retains few information 
about the sequence. Also, RNN gives a high prediction rate due to the pres- 
ence of more hidden layers in its architectural design. 


8.1.3. Optimizer Types 


Optimizer is a module which updates the neural network attributes like 
weight and learning rates. It is used to increase the production and decrease 
the error rate as shown in Figure 8.3 below. There are different types of 
optimizers which are Gradient Descent (GD), Mini-Batch GD, Stochastic 
GD, SGD-Momentum, Adadelta, Root mean square (RMS) prop, Adagrad 
and Adam optimizer are utilized to reduce the loss function. 


Target output Predicted output 
ERROR/ 
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OPTIMIZATION 
METHOD 
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luding target 
as ity ale WAODIE) Calculated by model 


Input 


Training Data Output 


Figure 8.3 Optimizer workflow. 
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8.1.4 Gradient Descent 


Gradient Descent optimizer is the most traditional optimizer used for solv- 
ing the case of convex optimization problem. To predict the cost function 
gradient to the parameters needs large memory and decrease the process 
since gradient is estimated for the entire dataset in one epoch. 

Stochastic Gradient Descent: It inherits the properties of gradient 
descent. This optimizer deals with a non-convex optimization issue. 
Instead of going batch processing it works on a single update at a time. To 
determine all the parameter value, it used same learning rate. It is rarely 
used in application because of the slow speed of computation, and frequent 
updates are expensive. 

Mini-Batch Gradient Descent combines SGD and batch gradient opti- 
mizer. Training data were divided into batches and updates are performed 
in a batch-wise manner. It does not make a promise to provide good con- 
vergence for all times and it occupies less memory. 

SGD with Momentum deals with adding momentum to regular 
Stochastic Gradient Descent. Using the momentum leads to a reduction 
of the noise ratio; incorporating extra hyperparameter is one of the draw- 
backs for this optimizer. 

Adagrad optimizer is adaptive gradient optimization algorithm. For 
determining the upgraded parameter value, the learning rate plays an 
essential role. For each epoch, various learning parameters are being used 
by this optimizer and it is very good in nature for sparse data. Adadelta 
is an addition of adaptive gradient optimizer, and it is responsible to take 
charge of violent nature of decreasing the learning rate microscopically. 

Adadelta and Root Mean Square Propagation optimizing algorithms 
evolved at the same time to sort out the destructive learning rate of adap- 
tive gradient problem. Two of them used Exponential Weighted Average 
to find out the learning rate. Root Mean Square Propagation is an adaptive 
learning technique which splits the learning rate through an exponentially 
weighted mean of squared gradients. For accelerating the optimization 
process, RMS prop will be used. 

Adam optimizer is Adaptive Moment Estimation method to compute 
learning rate adaptively for every parameter for each epoch. It utilizes 
the combination of RMS prop and gradient descent with momentum to 
find the parameter values. It is very popular for non-convex optimization 
problem and memory required is less, computation time is faster, it is best 
for the mobile objectives, and suited for large data and parameters with 
good computation. Compared to all other adaptive learning algorithms 
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optimizer Adam sounds good to train the neural network in minimum 
time and gives the best result. 


8.1.5 Types of Phishing Attack Detection 


Phishing detection attacks can be classified into three types: heuristic 
and machine learning—based approach, proactive phishing approach, 
and phishing-based black list and white list approaches. In heuristic and 
machine learning approach, class labels are available in the dataset in 
order to make the prediction process correctly. It is based on supervised 
and unsupervised learning algorithms. Proactive phishing approaches are 
similar to machine learning approaches which will help users to identify 
the URLs as legitimate or malicious by processing the URLs’ information. 
Black list and white list approaches are traditional methods for phishing 
URL detection. These methods are currently not in use due to the growth 
of web contents as the phishing process is tedious to predict. 


8.2 Literature Survey 


Anand Joseph Daniel [9] introduced a Machine Learning (ML) based algo- 
rithm for the detection of insecure websites which attracts online users 
to obtain their username and password. The proposed methodology used 
a mixture of Random Forest (RF) and Support Vector Machine (SVM) 
classifiers to detect and categorize the phishing websites into three types, 
namely malicious, spam and benign. The proposed methodology achieves 
an Accuracy (ACC) of 90% by testing with GENI phishing dataset. The 
main drawback is the non-consideration of the external factors which will 
reduce the ACC of the model when tested by a large-sized database. 

It might be difficult to build good classification models using skewed 
training data. Mahmoud [10] introduced RUS Boost, a fresh approach that 
addresses the issue of class imbalance. When training data is unbalanced, 
RUS Boost combines data sampling and boosting to offer a quick and easy 
way to improve classification performance. Additionally, to RUS Boost is 
outperforming SMOTE Boost, another hybrid sampling/boosting algo- 
rithm, in terms of performance. It has significantly quicker model training 
times and is less computationally expensive than SMOTE Boost. 

Brad Wardman [11] proposed a set of file-matching methodologies to 
detect the set of websites affected by a phishing attack using content-based 
approaches that cause intrusion to the user’s content. The file-matching 
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algorithms used include main index matching, phishDiff, deep MD5 
matching and context triggered piecewise hashing algorithms. The data- 
set used for the research consists of 49,840 URLs for which web contents 
are available. The URLs are matched with the Cyveillance company data- 
set labels in order to check for phished URL content. Each of the string 
matching algorithms identifies the phished URLs by comparing with the 
Cyveillance company dataset. The PhishDiff gives a high prediction rate 
when compared with the other algorithms. The proposed approach gives 
an accuracy greater than 90%; the main drawback of the approach is that 
more time is taken for the matching process. 

Mohammad Nazmul [12] proposed a framework using a set of ML algo- 
rithms, namely decision tree and RE. The proposed approach was tested on 
phishing dataset collected from Kaggle repository. In order to reduce the 
size of the dataset and to select the important attributes from the dataset 
for faster detection of phishing websites, principal component analysis was 
utilized. When performing the detection of phishing websites, random for- 
est algorithm performs better than decision tree algorithms and gives an 
ACC of 97.2%. The proposed method works well with the benchmarked 
dataset only, which is the drawback of the approach. 

Wenwu Chen [13] proposed an effective approach using Long short- 
term memory (LSTM) which is based on Recurrent neural network (RNN) 
for the detection of phishing websites. The dataset used for the work was 
gathered from Yahoo and Phishtank. The dataset was classified into two 
labels, namely original and phished. The features are reduced in order to 
effectively predict the phished websites by using dimensionality reduction. 
The proposed approach outperforms the traditional RNN by giving accu- 
racy of 97%. The main issue with the proposed method is the large training 
time for the training of the neural network. 

Sountharrajan [14] proposed a deep learning-based approach using 
Deep Boltzmann Machine (DBN) and Stacked Auto encoder (SAE) for the 
detection of phishing URL. The dataset used for the study was collected 
from Kaggle repository. Initially feature selection process was carried out 
for the reduction of dimensions of the dataset in order to effectively classify 
the phished URLs. The dataset was categorized as 80% and 20% for train- 
ing and testing phase, respectively. After feature selection process, classi- 
fication was done using Deep neural network (DNN). The accuracy of the 
proposed work is more than 85%. The main drawback of this method is the 
difficulty involved in training of the multiple layers of the neural network. 

Fatima Salahdine [15] proposed an ML-based methodology for find- 
ing phishing websites. More than 4,000 phished emails sent to North 
Dakota University were collected and analyzed for the detection process. 
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The important features from the emails are selected using feature selection 
process and used for testing and training purposes. Algorithms like SVM, 
Logistic regression (LR) and ANN was used for the detection of phished 
websites. Various ACC metrics like false positive rate, true positive rate, 
recall and precision were considered in the study. SVM with radial basis 
function outperforms all other classifiers in the prediction of phished web- 
sites. The main drawback of the study is the correct selection of feature 
selection process for the detection of malicious websites so that privacy of 
the information is preserved. 

Ishita Saha [16] proposed a deep learning approach using multi-layer 
perceptron (MLP) algorithm for the detection of malicious websites. The 
dataset for the study was gathered from Kaggle repository and has infor- 
mation about 10,000 websites. The detection method involves three phases, 
namely data collection, pre-processing, and classification of websites. In 
the pre-processing stage, feature selection methods are employed to select 
those subset of features used for classification. While training the model, 
each layers are trained in order to perform the detection correctly in the 
testing phase. The proposed detection model yields an accuracy of 95%. 
Training each layers of the neural network takes more time, which makes 
the model difficult to use. 

Ram Basnet [17] proposed a novel framework for the detection of 
phishing attacks using machine learning algorithms. The dataset used for 
the study was collected from the Ham corpora and Phishing corpus which 
has a combination of both legal and illegal emails. Biased SVM and Self- 
organizing maps (SOMs) was used for the classification purposes. Also, 
clustering algorithm, namely k-means algorithm, was used to cluster the 
set of legal websites. Biased SVM-based approach yields an accuracy of 
90% when compared with SOM algorithm. Ranking of features must be 
performed in order to yield more accuracy, which is the drawback of the 
proposed approach. 

Alfredo Cuzzocrea [18] proposed an ML-based approach to detect the 
phishing websites using Decision tree (DT) algorithm. The PhishTank 
dataset was used for the study in which only 10 features from the dataset 
were considered for the prediction. The proposed method has a learning 
step and a prediction step. In the learning step, feature vector is generated 
for the dataset and used for the model building, whereas in the prediction 
step, the test dataset is fed as input to the trained model and output is ana- 
lyzed to measure the performance of the proposed framework. The time 
taken for the learning phase is somewhat high, which makes the proposed 
model less likely to be used by the researchers. 
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Due to less data security, phishers can steal important information 
stored in a cloud environment. Dutta [19] proposed a neural network- 
based algorithm for the detection of malicious URLs that cause phishing 
attacks. Phishtank and AlexaRank datasets that consists of both legitimate 
and phished websites were used for the study. During the training phase, 
features are extracted which are then used to build the classification model. 
The model is then tested using various datasets for finding the accuracy of 
the proposed system. The model has found 7,900 phished websites from 
10,000 input websites, which results in high accuracy of the built model. 

A URL-based machine learning anti-phishing technique was suggested 
by Jain and Gupta [20]. To verify the effectiveness of their strategy, the 
authors took 14 attributes from the URL to identify the website as mali- 
cious or authentic. The suggested method was trained using nearly 33,000 
phishing and legitimate URLs in Naive Bayes (NB) and SVM classifiers. 
The process of learning was the main emphasis of the phishing detection 
approach. They identified 14 distinct features that distinguish authentic 
websites from phishing websites. As websites with SVM classification are 
found, the results of their trial have over 90% accuracy. 


8.3. Proposed Work 


A classic example of multilayer perceptron (MLP) is the neural network 
(NN) model that a hierarchical collection of neurons or units for high-level 
computation. Due to its layered architecture, it is possible to extract large 
number of features from the simple data which in turn helps for easy pre- 
diction process. The versatile structure of NN makes it more suitable for 
feature extraction and learning process. The flow diagram of the proposed 
work is shown in Figure 8.4. 

The dataset used for the study divided into 70% for training phase 
and 30% for testing phase. The input dataset is first pre-processed for the 
removal of inconsistent data and then important features are extracted 
from the data by using feature extraction algorithms in order to be used for 
the classification process. After the feature extraction process, the extracted 
features are given to Adam optimizer for fine tuning of required features. 
Lastly, the extracted features are given to the classifier for the prediction 
process. The prediction process categorizes the URL as either phished, 
legitimate or suspicious so that the user can easily identify the malicious 
websites. The phished URLs are fraudulent and can cause harmful actions 
when the user clicks on them. Legitimate URLs are the trusted type where 


AJ-ENABLED THREAT DETECTION AND SECURITY ANALYSIS 185 


Training Phase Testing Phase 


Data Pre-processing 


| ceen | 


Feature Extraction 
Feature Sets Feature Sets 


Algorithms 


Figure 8.4 Flowchart of the proposed work. 


users can use it. Suspicious URLs are similar to official URLs that cheat the 
users in order to cheat and harm them. 

Recurrent neural networks (RNN) are highly used for the processing 
of sequential data which has the ability to learn deeply using its different 
layers of neurons for correct prediction. RNNs share the benefit of Markov 
chain models, in that they process data sequentially, taking into consid- 
eration the data order. Typically, the input text is reduced to a series of 
letters, words, or phrases. Since RNNs form the foundation for the current 
trending language classifier models, it can be used for exact classification 
of emails so that phishing can be highly detected. 
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Figure 8.5 Architecture of RNN. 


It is important to note that prior works have utilized RNNs to classify 
harmful URLs and websites, which is a concern [21]. Researchers have uti- 
lized a number of factors and accurately identified websites and URLs as 
phished, suspicious and legitimate. The phished website detection problem 
can be categorized as a binary classification problem as there are only two 
classes, namely ham URLs and malicious URLs. It is straightforward to 
abstract this classification to predicting the value of x, where x equals 1 if 
phish and 0 if ham. The structure of RNN is shown in Figure 8.5. 

The RNN implementation can be represented mathematically (equation 
8.1) as, 


h(t) = fr (Wirx(t)+ Waeh(t-1)) 


y(t) = fo(Wroh(t)) (8.1) 


Here x(t) and y(t) represents input and output vectors 

W, W, W, represents weight matrices of the network 

f and f, represents hidden activation function and output activation 
function, respectively. 


8.3.1 Data Collection and Pre-Processing 


The proposed model predicts the good and bad connection website Uniform 
Resource Locator (URLs). Web page Phishing Detection Dataset contains 
a wide variety of data related to intrusions and anomaly URLs. This dataset 


AI-ENABLED THREAT DETECTION AND SECURITY ANALYSIS 187 


comprises 11,430 URLs with 87 extricated features. It acts as base to detect 
phishing URLs using machine learning algorithms. Features are catego- 
rized into three classes. From the structure and syntax of URL, 56 features 
are extracted. From the content of corresponding pages, 24 features were 
extracted and finally 7 features are extracted by querying external services. It 
maintains 50% phishing URLs and 50% legitimate URLs in a balanced man- 
ner. Dataset comprises URL length, host name length, IP, Path length, URL 
entropy, length ratio, punctuation count, suspicious word count, etc. A series 
of experiments was conducted to analyze the recurrent neural network- 
based classifier by using Web page Phishing Detection Dataset [6]. 

A Uniform Resource Locator is a Uniform Resource Identifier that will 
locate the available resources on the internet [7] and is used for web client 
request and response. The URL is made up of a sequence of strings, whereas 
some string has little semantic meaning. It is very tedious to derive seman- 
tic meaning from string in few URLs because joined words are incoherent. 

A simple example for URL is given below: 


https://www.google.co.in/search?q=wikipedia&sxsrf=ALiCzsax_704CVYK- 
3dXFzZSX1dWRzU0B5dQ%3A 1664036899393 &source=hp&ei=IzAvY_ 
i2Ffzh4-EPsaW Wm AM 


The above link has a scheme to predict the protocols used like HTTP or 
HTTPs. The second thing is the host name which will predict the machine 
that will have resources. It will have generic and country code top-level 
domain. In the above link, “co” is generic top-level domain, and “in” is the 
country code. Path refers to the basic information available in the host. In 
this example, path name is 


search?q=wikipedia&sxsrf=ALiCzsax_704CV Y K3dXFzSX1dWRzUOB- 
5dQ%3A 1664036899393 &source=hp &ei=IzAvY_i2Ffzh4-EPsaw WmAM 


The next component is the query string where it is a part of the URL 
which allots value to the designated parameters. 

Extracted features [8] will have varied formats and length. So, it will 
undergo pre-processing before getting into the later stages. Features 
extracted shall be in any form like text, binary format or may be in numeric 
value such as website sitemap depth or page content alikeness. The objec- 
tive of pre-processing will reduce the complexity and training time of the 
network [22-26]. The extracted features which have text will require fur- 
ther steps in pre-processing. 


i. Cleaning: Eradicate the space, special symbols and unfa- 
miliarized words. 
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ii. Words with vector representation: Here words in text are 
converted into vector format. Many tools are available for 
this conversion. It will tell us about the word similarity, so 
it decreases the language learning. It helps the network to 
discover the unseen words. 

iii, While the data is unscaled or with huge value range or 
with large input it will decrease the framework conver- 
gence. Input standardization will speed up the training 
time and decrease the stuck occurrence in local optima. 
For input normalization, Gaussian distribution is utilized 
for zero mean and unit variance from zero to one. 

iv. Webpage labelling with metadata tags will be helpful for 
the training process. 

v. Datasets may have URLs redundancy and varied URLs 
with respect to the same host; while going for data dedu- 
plication process, we can eliminate the repeated URLs and 
URLs with same host. 


8.3.2 Dataset Description 


Some benchmarked datasets are used to evaluate the performance of the 
proposed system. The datasets used for the study includes normal dataset 
and phishing dataset. Two sets of datasets are used for the analysis of the pro- 
posed system. The first dataset is collected from Kaggle named as webpage 
phishing detection dataset and is available in the link: https://www.kag- 
gle.com/datasets/shashwatwork/web-page-phishing-detection-dataset. 
The dataset has 11,400 URLs and 87 features. The features represent three 
different classes: 56 features denote URLs’ structure and syntax, 24 features 
are taken from the content of each pages and 7 features are extracted by 
external services query. The dataset has equal distribution of both legiti- 
mate and phished URLs; each of them is 50% of the total size. 

The next set of dataset used for the study is collected from the Phishtank 
dataset service provider. This provider has 5,000 phishing URLs that can 
be used for the model analysis and also provides dataset in CSV and JSON 
formats. The link to download dataset is https://www.phishtank.com/ 
developer_info.php. Legitimate URLs’ datasets are collected from New 
Brunswick University’s dataset collection. Totally 5,000 legitimate URL 
links are taken for the analysis of the proposed system. The link for the 
dataset is available at https://www.unb.ca/cic/datasets/url-2016.html. 


AJ-ENABLED THREAT DETECTION AND SECURITY ANALYSIS 189 


8.3.3 Performance Metrics 


To predict the effectiveness of a proposed model, Accuracy, precision, 
recall, F-measure and error rate are taken into account for performance 
metrics evaluation. 

Accuracy is defined as predicting the phishing and legitimacy rate of the 
total number of websites as shown in the below equation (8.2). 


Accuracy = ea (8.2) 
Y= TP + FP +TN +FEN 


High recall is stated as a reduced number of phishing websites that are 
termed as legitimate in the below equation (8.3). 


TP 


Se (8.3) 
TP+FN 


Recall= 


Higher precision is defined as a reduced number of legitimate websites 
that are identified as phishing websites as shown in the below equation (8.4). 


TP 


ea (8.4) 
TP+ FP 


Precision = 


F-measure termed as harmonic mean between precision and recall is 
shown in the below equation (8.5). 


Precision.Recall 
F—measure = 2. — (8.5) 
Precision+ Recall 


Error Rate calculates the legitimacy rate or phishing from wrongly cate- 
gorized websites as represented in (8.6). 


Error rate =1-— ey 
P+N (0) 


From the above equation FP, FN are the false positives, false negatives 
whereas TP and TN are true positives and true negatives of the model, 
respectively. High precision is recommended in the case when false 
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positives are not favored, high recall is preferable in situations when false 
negatives are not considerable. Higher value of F-measure gives the better 
performance of our model. 


8.4 System Evaluation 


This section describes the performance of the proposed method (HTML 
and URL) in terms of various performance metrics. The different ML clas- 
sifiers are implemented for evaluation of extracted features which are used 
in the proposed method. 

Table 8.1 shows different extracted features for text like TF-IDF word 
level, TF-IDF N-gram level, TF-IDF character level, global to vector 
(GloVe) pre-trained word embedding, character sequences vectors, count 
vectors (bag-of-words), word sequences vectors, trained word embedding 
and implementation of various classifiers. The absolute aim of the designed 
system is to find out the better textual features suitable for the selected 
data. From the obtained results, it is observed that character level TF-IDF 
features provides best performance compared to other features with signif- 
icant accuracy, precision, F-Score, Recall, and AUC using XG Boost and 
Deep Neural Network classifiers. Thus TF-IDF character level technique 
is implemented in this work to generate text features (F2) of the webpage. 


Table 8.1 The different textual content features performance on D1 dataset with 
different classifiers. 


Textual data Precision | F-score | AUC | Recall | Accuracy 
Classifier features (%) (%) (%) (%) (%) 


Boost | TEIDEN-gram | 88.65 89.53 | 90.43 | 91.33 | 92.25 
level 
TF-IDF character | 89.90 | 90.80 | 91.71 | 92.62 | 93.55 
level 


Word sequence 83.49 84.32 85.16 | 86.02 86.88 
vectors 


Character 82.28 83.11 83.94 | 84.78 85.63 
sequence 
vectors 


(Continued) 
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Table 8.1 The different textual content features performance on D1 dataset with 
different classifiers. (Continued) 


Textual data Precision | F-score | AUC | Recall | Accuracy 
Classifier ae ee cae (%) (%) real rca 


TF-IDF word | TE-IDE word level | 35.35 | 35 86.20 87.06 | 87. 87.93 | 88.81 81 


TF-IDF N- -gram 83.27 84.11 84.95 | 85.80 86.66 
level 

TF-IDF character | 77.21 77.99 78.77 | 79.55 80.35 
a 


Count vectors —_| vectors 83. 83.45 | 84.28 85.12 35.97. | 97 86.83 | 83 


Word sequences 63.52 64.15 | 64.80 | 65.44 | 66.10 
vectors 
Trained word 89.02 89.91 | 90.81 | 91.72 | 92.64 
embedding 
CNN Character 82.88 83.71 | 84.55 | 85.39 | 86.25 
embedding 
Trained word 90.47 91.37 |92.28 |93.21 | 94.14 
es 
TF-IDF word | TE-IDE word level | 86.80 | 80 87.67 | 67 | 88. 88.54 | 89.43 | 43 90.32 | 32 


TF-IDF N- -gram 87.64 88.51 89.40 | 90.29 91.20 
level 

TF-IDF character | 86.29 87.16 88.03 | 88.91 89.80 
oe 


| Count vectors —_| vectors 86.67 | 67 87.53 88.41 89.29 | 29 90.19 | 19 


Word sequence 82.38 83.20 | 84.03 | 84.87 | 85.72 
vectors 
Character 80.31 81.11 81.92 | 82.74 83.57 
sequence 
vectors 


(Continued) 
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Table 8.1 The different textual content features performance on D1 dataset with 
different classifiers. (Continued) 


Textual data Precision | F-score | AUC | Recall | Accuracy 
Classifier features (%) (%) (%) (%) (%) 
DNN TF-IDF word level | 87.95 88.83 89.72 | 90.62 | 91.52 


TF-IDF N-gram 89.00 89.89 90.79 | 91.70 92.62 
level 
TF-IDF character | 89.20 90.10 91.00 | 91.91 92.83 
level 


Word sequence 54.80 55.35 55.90 | 56.46 | 57.03 
vectors 


Character sequence | 77.17 77.95 78.73 | 79.51 | 80.31 
vectors 


TF-IDF N-gram 86.08 86.94 87.81 | 88.69 89.58 
level 

TF-IDF character | 85.40 86.25 87.11 | 87.98 88.86 
level 


87.71 88.59 89.47 | 90.37 | 91.27 


Word sequence 56.43 56.99 157.56 |58.14 | 58.72 
vectors 


Table 8.2 displays the experimental results with hyperlinks features. 
From the predicted results, it is evident that RF is a classifier better than 
other classifiers with an accuracy 83.09%, precision 78.37%, AUC 83.40%, 
recall 86.96% and F_Measure 82.45%, and it is evident that ensemble and 
XG Boost classifiers have obtained better accuracy of 83.00% and 81.29%, 
respectively. 

In Table 8.3, we coordinated elements of HTML and URL (hyperlink 
and text) using different classifiers to check complementary behavior in 
phishing sites recognition. With the experimental outcomes, it is seen that 
LR classifier has adequate exactness and performance as far as the HTML 
highlights. Interestingly, NB classifier has great exactness, ACC, F1-Score, 
AUC, and review regarding consolidating every one of the highlights. RNN 
and gathering classifiers accomplished high precision, review, F1-Score, 
and AUC regarding URL-based highlights. 
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Table 8.2 Performance of the proposed AO-RNN hyperlink features on D1 with 
different classifiers. 


Precision AUC Recall F_Measure Accuracy 
Classifier (%) (%) (%) (%) (%) 


78.37 83.40 86.96 82.45 83.09 


Table 8.3 Various feature combinations on dataset D1. 
AUC | Accuracy 
(%) (%) 
82.83 86.78 88.12 | 88.58 


Precision | Recall | F-score 
Classifier | Features (%) (%) (%) 
92.97 96.08 96.35 | 96.65 
88.73 91.68 92.44 | 92.75 


Ensemble | FURL 99.40 
FHTML 91.12 
FURL+ 94.83 
HTML 
RNN 99.53 93.06 96.44 
a 91.68 82.80 88.35 
94.75 


87.66 91.06 91.89 | 92.25 
82.22 


ai [asi | 595i 
88.45 75.23 


62.77 73.24 77.93 | 79.22 


93.19 96.93 
88.56 89.79 


95.91 97.34 97.55 | 97.73 


68.60 74.99 
82.80 85.00 


69.43 73.68 76.82 | 77.45 


66.33 
FHTM + 87.86 


100.58 
89.09 


FURL + 99.26 
HTML 


75.42 
84.34 
78.49 


194. WIRELESS COMMUNICATION FOR CYBERSECURITY 


In this examination, we contrast our methodology with existing 
anti-phishing approaches. Also the proposed methodology is assessed 
on benchmark dataset D26,13,30s in light of the four measurements uti- 
lized. The obtained examination results are displayed in Table 8.4. With 
the achieved outcomes, it is seen that the proposed methodology provides 
preferred execution over different methodologies examined in the survey, 
which displays the proficiency of distinguishing phishing sites over the 
current methodologies. Figure 8.1 shows the achievement of RNN with 
other approaches using dataset 1. Figure 8.6 shows the performance of 
RNN with other algorithms using dataset 1. Figure 8.7 shows the perfor- 
mance of RNN with various other procedures using dataset 2. 


Table 8.4 Comparison of AO-RNN vs. Other Standard Approaches (for 
Dataset 1&2). 


oa ian 
Dataset Methods (%) (%) (%) (%) 


Comparison of AO-RNN with other 
approaches for dataset 1 


90 
80 
70 
60 
50 


Precision (%) Recall (%) F-Score (%) Accuracy (%) 
MURLNET @&CNN AO-RNN 


Figure 8.6 Performance of RNN with other approaches. 
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Comparison of AO-RNN with other 
approached for dataset 2 
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90 
80 
70 
60 
50 


Precision (%) Recall (%) F-Score (%) Accuracy (%) 


MURLNET ® CNN Deep CNN ® AO-RNN 


Figure 8.7 Performance of RNN with other approaches. 


8.5 Conclusion 


The phishing site appears to be like its harmless authority site, and the 
resistance is the means by which to recognize them. This work has intro- 
duced a clever anti-phishing method that includes various elements (URL, 
hyperlink, and text) that was not discussed in any of the other works. The 
approach introduced here is a totally client-side arrangement. These high- 
lights proposed in this work are used on different AI calculations and it 
was found that RNN achieved the best execution. Our significant point is 
to plan a constant methodology, which produces lower misleading positive 
rate and higher evident negative rate. The outcomes display that our meth- 
odology accurately sifted the harmless website pages with a lower measure 
of harmless pages mistakenly delegated phishing. During the time spent 
phishing page arrangement, we build the dataset by removing the applica- 
ble and helpful elements from harmless and phishing pages. 
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Abstract 

The dynamic decision making and accurate decisions while communicating infor- 
mation in real-time applications lead to various security threats in the network. A 
number of intruders may steal the ideal nodes and manipulate them to their ben- 
efit, which may cause further drastic degradation of any organizational growth in 
the market. The objective of this paper is to propose a secure and trusted commu- 
nication mechanism by highlighting their various risks and preservation methods 
for wireless communication. The proposed method highlights the importance of 
trust-based computation while ensuring an efficient security and preservation in 
the network. The method is further verified over various security metrics against 
traditional scheme. 
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9.1 Introduction 


Nowadays, the generation of a huge amount of information can be eas- 
ily handled using various modern techniques or technologies in the envi- 
ronment. The millions and trillions of information can be easily handled, 
analyzed, processed and stored through online databases and systems 
such as cloud systems, Hadoop, servers, etc. [1, 2]. The intervention of 
human effort is further reduced by replacing the human loads with smart/ 
intelligent automation systems. Along with several advantages of modern 
techniques, these technologies also bring a huge number of other types 
of risks and challenges in the network. Security is considered as one of 
the types of crucial issue that is very difficult to manage while providing 
the communication through various smart devices in the network [3, 4]. 
It is crucial to ensure the security of intelligent devices while sharing the 
information among each other in the network. The dynamic systems are 
the ones where devices are heterogenous in nature and try to communicate 
among each other and ask for message transmission in the network. The 
number of security risks arise during the dynamic nature of devices such 
as denial of service, man-in-middle attack, distributed network, etc. [5, 6]. 
in addition, it is necessary to determine various preserving mechanisms to 
ensure a reliable communication among devices in the network. There are 
a number of security schemes to ensure security, such as encryption tech- 
niques, ticket-based techniques, trust-based techniques, etc. In this paper, 
we will discuss a number of trust-based security mechanisms. 


9.1.1 Need of Trust 


The involvement of smart devices automation while making any real-time 
decision lends more accuracy to the environment. However, the involve- 
ment of any type of threat of an intruder that is not allowed to proceed 
in its communication may further drastically affect the entire network. A 
number of security risk handlers, mechanisms, algorithms and mecha- 
nisms have been proposed by several researchers/scientists; however, other 
security risks automatically degrade the entire system performance [7, 8]. 
For instance, the alteration of ideal nodes in the network by the intruders 
may further increase the traffic congestion, overhead, storage and other 
types of risks in the network. Therefore, it is necessary to further improve 
the entire set of the communication process handled through intelligent 
systems or IoT devices. 
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Figure 9.1 A secured smart society framework having intelligent devices in the 
network [7]. 


Figure 9.1 represents the overall communication process in a smart 
automation system where a number of IoT devices are involved in the net- 
work while performing an efficient communication network in the system. 
Among various types of security models, trust-based security is considered 
as one of the efficient communication ways in the network. The trust-based 
computation improves the security process without affecting the cost, stor- 
age and much communication system in the network [9, 10]. 


9.1.2 Need of Trust-Based Mechanism in IoT Devices 


Though a number of security schemes have been proposed by various 
researchers/scientists, it is still necessary to propose a reliable, legitimate 
and transparent communication mechanism in the system. Figure 9.2 rep- 
resents the trust-based mechanism in intelligent devices where a number 
of ambient systems during the communication in various heterogenous 
systems may lose security. The integration of indirect or any other trust- 
based computation either with reinforcement learning or any other device 
may further improve the security in the network. 


9.1.3 Contribution 


The main aim of this paper is to propose a secure and efficient communi- 
cation procedure using dynamic trust-based computation that is used to 
detect/identify the trust value of each communicating device [11, 12]. The 
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Figure 9.2 Trust-based mechanism in IoT devices [27]. 


proposed mechanism computes the updated trust degree of the entire com- 
munication network using various security metrics. The remaining orga- 
nization of the paper is as follows. Section 9.2 describes the related work 
of various security models/methods proposed by several scientists. Section 
9.3 determines a proposed communication model using trust-based com- 
putation that further updates the trust rate of each communicating device. 
Further, section 9.4 determines the performance analysis of the proposed 
framework for validating the scenario. Finally, section 9.5 concludes the 
entire work along with its future research directions. 


9.2 Related Work 


This section describes a number of security approaches and methods pro- 
posed by various academicians, scientists and authors for ensuring a secure 
and effective communication process in the network [13, 14]. Table 9.1 
illustrates the number of security methods/algorithms and schemes along 
with their limitations. 
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Table 9.1 Literature survey. 


Measuring 
Author Technique parameter Limitation 


Zeke et al. [15] | Secure 
communication 
process using 
fuzzy systems 


confirmation 
process 


Angelogianni 
et al. [17] 


Systematic 
identification 
mechanism 


Guo et al. [18] | Fuzzy and 
AHP based 
evaluation 
method for 
assessment 


Liang et al. [19] | Risk evaluation 
for industrial 
sensor 
networks 

Wu et al. [20] | Cross-project 
security 


Determined the 
secure evaluation 
factor suing 
weight coefficient 
by computing 
the matrix and 
various security 
levels 


Detailed the concept 
by deducing the 
mathematical 
curves and 
formulas 
on various 
classification 
regions 

Evaluated the 
scenarios into 
three distinct 
categories 


Considered the 
advantages of 
both the existing 
frameworks such 
as AHP and fuzzy 
for ensuring 
accurate results 
and operability 
during the risk 
assessment. 


Studied the issues of 
security and risks 


for wireless sensors 


in industries 

Proposed a hybrid 
security based on 
uncertainty and 
text similarity in 
the network 


Suffer from 
communication 
delay 


Energy 
consumption 
while tracking 
the mechanism 
again and again 


Authentication 
process leads to 
delay 


Issues during 
dynamic 
behavior of the 
network 


Need to consider 
accuracy 


Needed to analyze 
the computation 
delay of trust 
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Zeke et al. [15] have proposed a monitoring system for ensuring a secure 
communication process using fuzzy systems in the network. The authors 
have determined the secure evaluation factor using weight coefficient by 
computing the matrix and various security levels. The proposed mecha- 
nism provided various methods and ideas for ensuring a secure communi- 
cation process for wireless energy stations. Zhong et al. [16] have proposed 
an index confirmation process for identifying the eavesdropping in a wire- 
less communication process. The authors have detailed the concept by 
deducing the mathematical curves and formulas on various classification 
regions. The proposed framework is claimed to reduce the eavesdropping 
probability over existing method. Angelogianni et al. [17] have proposed 
a systematic identification mechanism for detecting the vulnerabilities 
while considering present legislative frameworks. The proposed risk man- 
agement and assessment system evaluated the scenarios into three distinct 
categories. The evaluated results validated the proposed framework against 
state-of-art cellular schemes. Guo et al. [18] have proposed a fuzzy and 
AHP-based evaluation method for assessment of the risk in the system. 
The authors have considered the advantages of both the existing frame- 
works such as AHP and fuzzy for ensuring accurate results and operability 
during the risk assessment. 

In addition, Shabisha et al. [21] have projected an enhanced and new 
security mechanism for determining the emergency issues in healthcare 
systems. The authors have proposed a key agreement and an authenticated 
security mechanism by relying on the symmetric key schemes for mea- 
suring the security and anonymity on the nodes. The authors have further 
developed a commercial off the shelf system while transmitting the infor- 
mation by generating the warning and emergency alarms. Further, Zhang 
et al. [22] have studied numerous controlling strategies including coefh- 
cient and association designs from the network perspectives. The authors 
have projected the online and offline controlling strategies for accessing 
the channel information. They have projected a dual composition trans- 
forming function for designing the distributed controlling scheme. 

Though a number of schemes have been proposed, it is further neces- 
sary to focus on the accuracy and computational steps of trust along with 
reduced transparency time using blockchain system. This paper proposed 
a secure and trusted communication mechanism using blockchain-based 
adaptive and comprehensive trust computation of each DS that is further 
verified against accuracy, computational delays and probability attacks of 
each communicating devices [23-25]. 
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9.3 Proposed Framework 


9.3.1 Dynamic Trust Updation Model 


In this paper, we are proposing a dynamic trust updations mechanism 
that is based upon Bayesian formulation using beta distribution method 
as proposed in [20]. In addition, the trusted mechanism is further inte- 
grated with blockchain technology in order to strengthen the risk analysis 
and detection by providing transparency during distribution or transmis- 
sion of messages in the network. Now, the trust of a node is computed by 
maintaining a data structure having a number of variables such as a, f, p, 
q. Each node i will maintain a trust degree of any node j in its trust table 
T as T, that represents the trust degree of node j maintained by node j in 
its trust table. In addition, the trust degree keeps on updating depending 
upon its internal communication activities and behavior that can be exam- 
ined and represented as Internal Behavior (JB.) trust degree. The inter- 
action (I) among internal behavior of each node and its table degree is 
represented as: 


say = TUB.» J 


Where, IB, is further mapped to a pair of (p, q) that represents the rat- 
ing allocated to the node j by node i depending upon its recent activity. In 
addition, I(.) is the updating trust degree due to interaction among nodes 
and communication among each other. I(.) is responsible to update the 
trust degree of node j depending upon its recent communication behavior 
and activeness in the network. 

Further, the proposed model uses the Bayesian formulation along with 
beta distribution method inspired by the work of Ganeriwal et al. The 
Bayesian formulation using beta distribution scheme separates the trust- 
worthy and non-trustworthy nodes in a and 6 where trust table can be 
further represented as: 


T,, = Bla, +1, B, + V) 


Where, «, B, determines the trustworthy and non-trustworthy commu- 
nication interactions among node I and node j. 

Furthermore, the trust degree keeps on updating using Internal Behavior 
(IB,) where p and q are considered as some if the integers that increase 
or decrease the rating of each communicating node in the network. 
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The updated trust degree corresponding to node j by node I can be further 
represented as: 


T= Bla,+1+pB.+1+4) (9.1) 


Where, the two parameters a, and f, values can be further defined as: 


a; =(e*a;)+p 


(9.2) 
Bj =(e* Bi) +4 


Where, e termed as changing weights denoted as e between (0,1). 

Now, in order to keep a surveillance of changed trust degrees by the 
communicating nodes can be further traced using blockchain mechanism. 
The section below determines the blockchain benefits while maintaining 
or changing the trust degrees of each communicating node. 

Here, we are proposing a dynamic trust updations mechanism that is 
based on Bayesian formulation using beta distribution method as pro- 
posed in [19]. In addition, the trusted mechanism is further integrated 
with blockchain technology in order to strengthen the risk analysis and 
detection by providing the transparency during distribution or transmis- 
sion of messages in the network. Now, the trust of a node is computed by 
maintaining a data structure having a number of variables such as a,,p,q. 
Each node i will maintain a trust degree of any node j in its trust table T 
as T_ij that represents the trust degree of node j maintained by node j in 
its trust table. In addition, the trust degree keeps on updating depending 
upon its internal communication activities and behavior that can be exam- 
ined and represented as Internal Behavior (JB,) trust degree. 

The other classifiers such as KNN, Naive Bayes, etc., can be used to deter- 
mine the legitimacy of each communicating device instead of Bayesian for- 
mulation method; however, the mentioned performance metrics are much 
accurately determined because of updations of trust values continuously. 
Furthermore, the trust degree keeps on updating using Internal Behavior 
(IB,) where p and q are considered as some of the integers that increase or 
decrease the rating of each communicating node in the network. 

The out-performance of proposed framework is due to its changing 
trust degree upon each communication process that updates the internal 
behavior of each node in the network. The altered node can be easily iden- 
tified by the proposed approach because of its changed trust degree. 
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9.3.2. Blockchain Network 


Algorithm 1 
Begin 
Step 1: Input: A set of communicating nodes in a network as N , 
Output: Device is trustworthy or non-trustworthy. 
Given: Various categories of nodes along with a blockchain network. 
Step 2: a) establish a network N| where node j trust degree is maintained 
by node I and so on 
For all nodes N, = {where N=1,2,....n} 
For i=1 ton then 
Compute the trust degree of each node using Bayesian 
formulation and beta distribution scheme 
If (Device is trustworthy) then 
Maintain a blockchain and keep surveillance 
of its behavior 
Else 
Node will not be allowed for further commu- 
nication in the network 
End if 
b) Maintain a Blockchain of trustworthy nodes in the network. 
Step 3: Each trustworthy node is surveillance with their trust degrees using 
blockchain network 
End For 
End For 


The Bayesian formulation using beta distribution scheme separates the 
trustworthy and non-trustworthy nodes in a and f where trust table can be 
further represented. Each node i will maintain a trust degree of any node j 
in its trust table T as T, that represents the trust degree of node j maintained 
by node j in its trust table, In addition, the trust degree keeps on updating 
depending upon its internal communication activities and behavior that 
can be examined and represented as Internal Behavior (IB;) trust degree. 
The blockchain network maintains a block of chain having all the commu- 
nicating nodes in the network. The nodes that are communicating and their 
weights are changing depending upon their internal behavior can be easily 
traced using blockchain architecture. The blockchain network maintains 
a block of nodes having their trust degrees and neighboring nodes infor- 
mation in the network. The process of integration of dynamic updations of 
trust degrees with blockchain network can be illustrated in Algorithm 1. 

The presented algorithm 1 represents the device's legitimacy by com- 
puting the trust values during data transmission in the network. The 
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Figure 9.3 Flowchart of proposed mechanism [26]. 
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trust values are computed using Bayesian formulation for computing 
the behavior of each device. Further, in order to maintain the continu- 
ous surveillance of the communicated nodes, the devices are maintained 
in a blockchain network. The word dynamic with the updations is used 
in case of changing the trust degrees upon each communication between 
node i and node j. The trust values are changing based upon their internal 
communicational behavior and activeness ratio in the network. The main 
aim of this manuscript is to ensure the security amongst users that utilizes 
the network services or provide security of stored information. Further, 
the flowchart of the proposed mechanism is presented in Figure 9.3. The 
presented flowchart determines the explanation of the proposed mecha- 
nism where devices are communicated while computing their trust values. 
The devices that have significant trust values are defined as legitimate and 
allowed to perform communication and transmission of information in the 
network. In order to reduce storage overhead and to improve efficiency of 
the network, we have maintained a blockchain for different purposes. The 
size of data in communicated device is considered as 128 bits that is fur- 
ther checked by determining their trust values using Bayesian formulation 
scheme. The network is updated after a specific amount of interval for their 
continuous assessment that is further maintained through blockchain. If 
the trust degree information sent by other nodes are corrupted, in that case 
for the being time the legitimate device can be further identified as altered. 
However, the previous history interaction and their continuous behavior 
in the network while forwarding the information may further update the 
trust degree of that device. In addition, all the devices along with their trust 
degrees are added in the blockchain that keeps the entire history of their 
trust values while communicating the information in the network. 


9.4 Performance Analysis 


The performance evaluation of the proposed mechanism that is sim- 
ply defined as the integration of trust degrees and blockchain network is 
implemented and verified using MATLAB simulation. The network size 
of 50 nodes is considered for validating the proposed scenario against a 
traditional approach over various security metrics. 


9.4.1 Dataset Description and Simulation Settings 


Having a synthesized dataset where some of the nodes are selected as trust- 
worthy and some of them are acted as non-trustworthy by externally is 
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used to measure the outperformance of the network. The simulation is run 
in a network size of 700 * 700 m of area size, 50 number of communicating 
nodes starting from 5-50 and a running time of 60s with a channel capacity 
of 2.5 Mbits. In addition, IEEE 802.11 MAC protocol is used to maintain 
the communication with some defined initial values such as a, B, as 0,0 
and value of e as 0.95 during the network establishment. Initially the trust 
value is randomly distributed among (0,1) that can be further updated by 
computing the trust degree of each communicating node in the network. 


9.4.2 Traditional Method and Evaluation Metrics 


The proposed framework is analyzed against Zhong et al. [8] (known as 
baseline approach) in which the authors have proposed an index confir- 
mation process for identifying the eavesdropping in a wireless commu- 
nication process. The authors have detailed the concept by deducing the 
mathematical curves and formulas on various classification regions. The 
proposed framework is claimed to reduce the eavesdropping probability 
over existing method. 

In addition, the evaluation metrics used to verify the outperformance of 
proposed framework over existing work is defined as node alteration rate, 
accuracy, and delay. The in-depth definition of each metrics is determined 
as below: 


Alteration rate: The alteration rate is termed as the number of nodes that 
can be easily altered from trustworthy to non-trustworthy by the intruders 
during the communication process in the network. 

Accuracy: The accuracy is generally used to measure the complexity of the 
proposed mechanism regarding how much time and effort the approaches 
need to accurately determine the trustworthiness or internal behavior of 
each communicating node. 

Delay: It is defined as the amount of time required to determine the ideal 
nature or activity of each communicating node in the network. 


9.5 Results Discussion 


Figure 9.4 presents the graph of alteration rate that clearly determines 
the outperformance of the network over the existing approach. The out- 
performance of the proposed framework is due to its changing trust degree 
upon each communication process that updates the internal behavior of 
each node in the network. The altered node can be easily identified by 
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Figure 9.5 Accuracy rate. 


the proposed approach because of its changed trust degree. Figure 9.5 
depicts the accuracy rate of the proposed framework against the existing 
mechanism. 

The accuracy of trust degree is due to its blockchain integration where 
the nodes are integrated depending upon its activeness in the environ- 
ment. The nodes having higher trust degree provide more accurate results 
that will always be in surveillance using blockchain network. 

The nodes having lesser trust degrees will never take part in commu- 
nication process and will never be part of blockchain network. Finally, 
Figure 9.6 determines the delay, which means the amount of time required 
to determine the accuracy and trustworthiness of the node in the network. 
The nodes having higher trust degrees are also the part of blockchain net- 
work and keep on surveillance by the environment. The delay in identi- 
fying the trustworthy nodes in the proposed mechanism is much less as 
compared to the existing approach because of involvement of blockchain 
network. 
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9.6 Empirical Analysis 


The performance analysis can be further measured using an empirical 
analysis method where the performing metrices are analyzed based upon 
their complexity and communication delay. The text below determines the 
number of empirical analysis schemes for further analyzing the proposed 
and existing methods. 


e Energy consumption: The transmission of information 
and communication among the devices leads to consump- 
tion of energy. The devices consume an amount of energy 
while communicating and emitting the energy loss in the 
environment. 

e Communication overhead: The communication overhead 
is considered as another significant metric while analyzing 
the performance of a network. The large amount of security 
schemes or methods may result in huge computations that 
may further delay the response by the network. 

e Response delay: the response delay metric determines how 
much time a systems needs to respond to the requested input 
from the user. The device having huge communication and 
communication overhead may lead to a large delay and late 
response in the network. 

¢ Falsification: the information falsification where intrud- 
ers successfully hack the legitimate devices ID and try to 
authenticate themselves in the network with the means of 
their own benefits. 
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All these metrics can be further analyzed for both proposed and existing 
approaches to measure the overall outperformance of the proposed phe- 
nomenon. By considering all the metrics, it is further analyzed that the 
proposed mechanism will always perform better as compared to the exist- 
ing approach because the proposed approach does not have huge commu- 
nication and computation loads to respective devices. The legitimacy of 
each device is analyzed using one single approach and reduces the energy 
consumption and overhead at the devices. In addition, the blockchain 
maintenance may further reduce the response delay and falsification threat 
because of continuous surveillance in the network. 


9.7. Conclusion 


A dynamic and updated trust-based communication and secure transmis- 
sion process among wireless nodes process is presented in the paper. The 
trust degree of each communicating node is determined using Bayesian 
function having beta distribution. In addition, the communicating nodes 
are further surveilled by the blockchain network to increase its transpar- 
ency in the network. The proposed mechanism is efficiently validated and 
verified against the existing approach over various security metrics such 
as alteration rate, accuracy and delay via communicating the information 
among various nodes in the network. The proposed framework out-per- 
formance can be clearly seen through various graphs and computed results 
using MATLAB simulation. The proposed framework can be further 
extended by including other security threats related to wireless communi- 
cation such as Sybil attack, denial of service, throughput and end-to-end 
delay in the future communication. 
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Abstract 
With the ever-growing increase in the technologies, the newer generation always 
has their eyes on the next level of every pre-existing technology that was ever made. 
Along with the improvement of 5G technologies, the industries and academia have 
commenced having their eyes on the 6th generation wireless network technology 
(6G), which is anticipated to have higher and numerous threat coping mechanisms. 
As the deployment of 5G networks have grown over the years, the number of lim- 
itations has also significantly increased. 6G networks are expected to have extended 
connectivity which the current generation might have not even thought of. The 
upcoming and cutting-edge technology together with post-quantum cryptography, 
artificial intelligence (AI), machine learning (ML), more advantageous edge comput- 
ing, molecular communication, THz, visible light communication (VLC), and dis- 
tributed ledger (DL) technology together with blockchain, is said to shape the spine 
of 6G networks. One critical element within the fulfillment of 6G can be security. 
New novel authentication, encryption, get admission to control, communi- 
cation, and malicious activity detection, new safety techniques are important to 
ensure trustworthiness and privacy of the future networks. On the basis of 5G 
technique, 6G will have a profound impact on ubiquitous connectivity, holographic 
connectivity, deep connectivity and practical connectivity. As 6G is expected to 
become an additional open network to 5G, the inside and outside of the network 
may become increasingly blurred. Therefore, current network security methods, 
equivalent to IPsec, firewall, intrusion detection system (IDSs) etc. that enforce 
security for network edge purposes will not be robust enough. To mitigate this 
limitation, 6G security design must support the fundamental security principle of 
Zero Trust (ZT) within the mobile communication network. 
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10.1 Introduction 


6G (sixth generation) serves as the latest technology that uses wireless 
networks for cellular networks with higher frequencies and large cover- 
age area. It’s not implemented on a user level yet, but research has already 
been done for a more ubiquitous and reliable internet presence across all 
cellular networks. The 6G networks will be able to operate on higher fre- 
quencies than 5G networks. This will mean that the way transceiver boxes 
are manufactured will also be changed, which will result in new archi- 
tecture for 6G networks. As the rule states, transmission rate requires a 
tighter distribution of cells over a network. It can be anticipated that 6G 
can be used to harness the power of 5G and improve it tenfold to deliver 
ultra-fast speeds, exceeding device capacity and next-to-no latency. The 
fundamental ingredient of the 6G network is that it’s expected to selec- 
tively use different frequencies to measure absorption and adjust frequen- 
cies accordingly. This is because the atoms and molecules in a substance 
emit distinctive frequencies but their emission and absorption frequen- 
cies are the same for any substance. 6G will have big implications for 
many government and industry approaches to public safety and critical 
asset protection such as terahertz (THz) band, AI, optical wireless com- 
munication (OWC), 3D networking, unmanned aerial vehicles (UAV), 
and wireless power transfer. 6G networks are bound to have superior 
performance since they have to be evaluated on a lot more metrics and 
their measures are on a higher scale than 5G networks. In India, N77 
and N78 are the most popular 5G bands that are in smartphones. But a 
higher 5G band ranges between 24-47GHz. This provides a maximum 
data throughput of 18-20 Gbps for 5G [6]. Because 6G networks run at 
many frequencies to adjust with the use of THz and optical frequency 
bands, this speed might be exponentially improved. With these high fre- 
quency bands, the data rate can reach Gbps at the user level. As a result, 
the area traffic throughput can exceed 1 Gbps/m2. To compensate for 
the 100-fold increase in data rate, spectrum efficiency can grow by 3-5 
times, while energy efficiency must increase by more than 100 times. The 
use of artificial intelligence (AI) can help with the administration of such 
frequency bands and networks. Due to the usage of exceedingly hetero- 
geneous networks, different communication scenarios, vast numbers of 
antennas, and broad bandwidths, the connection density will increase by 
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1,000 times. Because of satellites, unmanned aerial vehicles, and ultra- 
high-speed railways, mobility will be much higher than 500 km/h. When 
comparing its performance to that of other networks, several indicators 
such as network security, storage, range and coverage, cost efficiency and 
sustenance, and so on can be taken into account [7]. 


10.2 Evolution of 6G 


The mobile communication sector has advanced dramatically, particularly 
in transmission technologies and frequency bands, from the first mobile 
system to the impending 6G mobile system. Each generation has its own 
unique traits, methods, and skills. 

In the early 1980s, 1G cellular networks were introduced that relied on 
analog transmission for voice services. Nippon Telegraph and Telephone 
(NTT), a cellular system provider, started its business in Tokyo, Japan, in 
1979. Then Europe launched the cellular system two years later. The most 
well-known analog systems were Total Access Communication Systems 
(TACS) and Nordic Mobile Telephones (NMT). The only problem with 1G 
networks was the use of analog signals for transmission such as: B. Security 
against poor quality calls, excessive power consumption and inadequate 
data capacity. 

In 1991, the second-generation mobile network (2G) was introduced. 
Theoretically, this cellular network would include an integrated global dis- 
tribution of multiple base stations (BS) that would allow users to connect 
multiple access points (FDMA, CDMA, and TDMA). Therefore, 2G tech- 
nology can be used for compression/decompression methods such as 2G 
Global System for Mobile Communications (GSM), 2.5G General Packet 
Radio Service (GPRS), 2.75G Enhanced Data Rate for Global Evolution 
(EDGE) (codecs, etc.). Compared to 1G, 2G offers superior cellular ser- 
vices and digitally protected data transmission [8]. 

With the introduction of the third generation, the mobile network con- 
tinued to evolve (3G). The network transitioned from a traditional mobile 
network to portable media devices as connection speeds increased (e.g., 
computers, gaming consoles, and tablets). CDOMA2000, Wideband Code 
Division Multiple Access (WCDMA), and Time Division Synchronous 
Code Division Multiple Access are the three essential technologies for the 
3G network (TD-SCDMA). 

After that, 4G was deployed. Fourth generation is associated with the 
term “MAGIC, which stands for “mobile multimedia anywhere, global 
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mobility solutions over integrated wireless and tailored services.” Users can 
enjoy smooth network access and end-to-end IP transmission, as well as 
QoS management with greater service quality, mobility, and a data transfer 
rate of 20 Mbps. 

After the original complete set of 5G specifications was established, the 
commercial application of 5G began in 2019. The advent of 5G signals 
the commencement of a global digital era with groundbreaking wireless 
technology standards, including data transfer rates, latency, mobility, and 
even the number of linked devices. The characteristics of 5G have truly dis- 
tinguished this modern mobile network from its forerunners. 5G denotes 
the next step in the evolution of communication networks. Going beyond 
5G to meet the expanding technical needs and demands at all levels is 
unavoidable, given the rapid technological breakthroughs and inventions 
over the last decade. 

The first commercial 6G technology system is expected to be launched 
in 2030. By that year, a global digital civilization powered by improved 
and practically instantaneous wireless communication is envisioned. 6G 
is a self-governing computer that may replicate human intellect and con- 
sciousness and provides a variety of ways to communicate and interact 
with smart terminals (for example, through brain waves or neurological 
signals, eyes, fingers, and voice) [9]. 

As a vision for the future, and because 6G can use a very high spectrum 
compared to its predecessor, a 6G network with a multiband spectrum will 
spread hundreds of Gbit/s to Tbit. A connection to/s will be available. For 
example, this combination uses the 13 GHz band, the 30300 GHz band 
for millimeter waves, and the 0.0610 THz band for THz. The evolution 
of cellular networks from 2G to 5G is designed to serve people. That is, it 
reduces the delays caused by human response times such as visual response 
time (10 ms), auditory response time (100 ms), and perceptual response 
time (1 ms). 

By the sixth generation, the wireless evolution from connected things to 
linked cognition is expected to be substantially altered. Furthermore, the 
delivery of ubiquitous AI services from the network’s core to end devices 
necessitates 6G. Artificial intelligence (AI) will be crucial in the creation 
and optimization of 6G protocols, infrastructures, and operations, to put 
it another way. Below is a table that summarizes all the information about 
mobile networks throughout the years. 
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Network | Year of 
type launch Functions/Application 


1G mobile | early 1980s | a) It used an analogue transmission signal that 
network could only handle voice services at a speed of up 
to 2.4 kbps. 
b) Relied on analogue transmission for speech 
services. 
c) It also used a frequency range of 800-900 MHz, a 
bandwidth of 40 MHz, and a channel capacity of 
30 kHz. 
d) Frequency division multiplexing was utilized in 
the first generation. 


2G mobile | 1991 a) TDMA and CDMA were used in the second 
network generation. 

b) 2G offered more services, such as “short message 
service” (SMS) and “multi-media service” (MMS), 
as well as higher service quality. 

c) 2G was enhanced to work in the 850-1900 MHz 
frequency band with a data rate of up to 64kbps. 


3G mobile | 2003 a) High-quality internet access. 
network b) Improved security by allowing users to 
connect to other wireless devices using user 
authentication capabilities. 
c) Defined by IMT-2000 technical specifications that 
include features of reliability and speed, namely a 
data transfer rate of at least 200 kbps. 


4G mobile | 2009 a) Allowed users to connect to the network anytime 
network and from any location. 
b) Allowed users to have smooth connectivity and 
improved service quality. 


5G mobile | 2019 a) Users can expect considerable benefits from 5G, 
network including data transfer rates of up to 10 Gbps, 

significantly lower latency (almost 10 ms) at 
larger capacity, reliability, and QoS. 

b) 5G is the first to use the mm Wave band, a brand- 
new frequency band technology. 

c) Provides a single platform for a variety of 
applications, including improved mobile 
broadband communications, automated driving, 


virtual reality, and the Internet of Things. 


(Continued) 
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(Continued) 


Network | Year of 
type launch Functions/Application 


6G mobile | 2030 a) 6G will play a key role in merging seamless 
network (expected) wireless connectivity with numerous technology 
functions that support full-vertical applications. 


b) 6G will dramatically improve the data rate speed, 
up to 100-1000 times faster than 5G. 

c) In terms of capacity, 6G aims to sharply increase 
the capacity by up to 1000 times more than 5G. 

d) 6G will provide latency up to 10-100 us. 


10.3. Functionality 


For faster speed and data rates, the radio frequency spectrum (RF spec- 
trum) in 6G has been expanded to THz spectrum. THz waves with a fre- 
quency range of 0.1 to 10 THz and a wavelength of 30 to 3000 microns 
move through the spectrum. This allows for high transmission rates and 
broad broadband access, which might be useful in future mobile commu- 
nication systems. 

Furthermore, the THz band can keep up with the capacity of nanoscale 
cells up to micrometers across 10 m without sacrificing communica- 
tion speed. Various systems, such as Holographic Beamforming, employ 
antenna arrays to send and receive a focused narrow beam with a very high 
gain. This is accomplished by focusing power in a restricted angular range. 
Beamforming increases the signal-to-interference-plus-noise ratio (SINR), 
which may be used to monitor individuals, while also improving coverage 
and throughput. 

As a result, the 6G network's position accuracy improves dramatically. 
6G Quantum communication is another technology that has to be utilized. 
According to quantum principles, data encoded in a quantum state (using 
photons or quantum particles) cannot be retrieved or replicated without 
changing the data (e.g., correlation of entangled particles and inalienable 
law). 

In addition, the superposition property of qubits allows for a faster data 
transfer rate in QC. Quantum key distribution, quantum secret sharing, 
quantum teleportation, and quantum secure direct communication are 
now all possible with QC. One of the most important features of this tech- 
nology is its capacity to greatly improve data security and reliability. 
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The design system and speed will be governed by edge intelligence. The 
main motive of 6G is to provide greater speeds using THz communica- 
tions, which limits the communication range even further. As a result, in 
order to enable connection for objects in an environment, all devices must 
get services from a nearby access point (AP). On one side of the AP, there 
is a very high end and a lower band and speed end, and on the other side, 
there is a lower band and speed end. As a result, edge computing capabil- 
ity must be improved. Hence, 6G is expected to drive toward more robust 
edge capabilities to handle communication mismatches, as well as a new 
basic architecture to have more intelligent edge computing, thanks to the 
clever use of AI approaches. 

Energy efficiency is another component of 6G that should be examined. 
The IRS (intelligent reflecting surface) is a revolutionary wireless commu- 
nication idea. It has recently been seen as a potential technology capable 
of significantly lowering the energy consumption of wireless networks. 
Because it is extremely simple to install and uses very little energy, IRS- 
assisted communications could be utilised to dramatically improve the 
coverage of 6G networks. Indoor residual spraying (IRS) can be used to 
coat the external walls of buildings in the outdoor urban environment to 
reduce energy consumption. This allows for the expansion of 6G network 
coverage while simultaneously increasing energy efficiency. 

Given the widespread use of the 5G mobile network, mobile devices’ 
technological capabilities should be compatible with the upcoming 6G 
features.. Because some of the new 6G features are incompatible with 5G 
devices, increasing the technological capabilities of mobile devices for 
6G may result in higher expenses. Individual devices, on the other hand, 
struggle to accommodate 1 Tbps speed. 


10.3.1 Security and Privacy Issues 


6G networks are giving high reliability, low latency and secure and efficient 
transmission services. However, most of these technologies come at a cost 
of new security and privacy concerns [1]. 


10.3.1.1 Artificial Intelligence (AI) 


AI is commonly regarded as one of the major components of the future 
network architecture, when compared to all other technologies projected 
to be deployed in 6G networks [10]. To say that artificial intelligence has 
gotten a lot of attention in the subject of networking is an understatement. 
As a result of this increased focus, a growing number of new security and 
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privacy issues have emerged. Although AI is apparently operated in iso- 
lated locations where huge amounts of training data and powerful but pri- 
vate processing hubs are available in the 5G network, AI will become a 
more central component of the 6G network. The physical layers, which 
comprise devices such as data cables and network infrastructure, and the 
computational layers, which include software-defined networks, network 
function virtualization, cloud/edge/fog computing, and so on, are the 
architectural levels that AI technologies serve. 


Physical Layers 

Many AI-based technologies, such as deep neural networks and 
supervised/unsupervised learning, can be applied to many physical lay- 
ers. These approaches can not only increase physical layer performance 
by improving connectivity, but they can also forecast traffic and improve 
security. Unsupervised learning methods could be employed in the authen- 
tication process to improve the physical layer security. To prevent informa- 
tion breaches, machine learning—based antenna designs can be utilized in 
physical layer communication. Machine learning and quantum encryption 
algorithms might also be employed to defend the security of communica- 
tion links in 6G networks, according to the researchers. 


Network Architecture 

It was believed that AI may improve edge security via security systems and 
fine-grained controls in terms of network architecture. It was also known 
that artificial intelligence (AI) technologies, specifically deep learning, 
might be utilized to detect dangers in edge computing. However, the con- 
cept has to be investigated further. 


Al algorithm block 
) 


Sub physical layer block 
(g) 


Figure 10.1 Physical layer in artificial intelligence [2]. 
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Alice's Tor client contact TOR network 
to obtain a random path to destination server Sie 


Alice Relay Node 
TOR Network 
An AES encrypted http Relay Node 
request to and 
from TOR network 
Guard Node Relay Node 


Relay Node 


Server gets plain 
http packets with Server 
To/destination address 


For obfuscation purpose, multiple guard and exit nodes exist 


Figure 10.2 Network architecture for better security [2]. 


Other Functions 

AI is useful in various areas besides the physical layers and network design, 
such as large data analysis, distributed AI, resource management, and net- 
work optimization (Figures 10.1 and 10.2). AI had previously been shown to 
aid in the detection of network anomalies and the provision of early warn- 
ing measures to improve the security of 6G networks. It was also discovered 
that deploying distributed and federated AI in a 6G network eliminates the 
requirement for edge devices to communicate data, further enhancing net- 
work security. The impact of data correlation in various machine learning 
algorithms has been shown to result in an increase in privacy leaks. 


10.3.1.2. Molecular Communication 


A natural phenomenon observed among living beings with nanoscale 
structures is molecule communication. Microscale and nanoscale tech- 
nologies are becoming a reality because of advances in nanotechnology, 
bioengineering, and synthetic biology over the last decade. Furthermore, 
the energy required for the formation and propagation of a molecular 
communication signal is negligible. Although this phenomenon has been 
researched for many years in biology, it has only recently been a research 
issue in the realm of communication. For 6G communications, molecular 
communication technology is a very promising technology. It is, however, 
a multidisciplinary technique that is still in its infancy. The core con- 
cept of molecular communication is the transmission of information via 
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biochemical signals. It was demonstrated as a mobile molecular communi- 
cation technique that allows the transmitter, receiver, and associated nodes 
to interact while moving. 

Several security and privacy concerns relating to the communication, 
authentication, and encryption processes, however, have already been dis- 
covered. Only a few researchers have looked into the safety of molecular 
communication lines, despite the fact that it was known that this form of 
communication channel may be interrupted by an opponent. 


10.3.1.3. Quantum Communication 


Another communication technology with a lot of potential in 6G networks 
is quantum communication. One of its key advantages is that it may con- 
siderably improve data transfer security and dependability. The quantum 
state will be modified if an opponent listens in on, measures, or replicates 
anything in quantum communication. As a result, the recipient is unable to 
be ignorant of the interference. In theory, quantum communication might 
give complete security, and it could be ideal for long-distance communi- 
cation with the right breakthroughs. It provides a slew of new features and 
raises communication to a level that older systems can't match. Quantum 
communication, on the other hand, is not yet a panacea for all security and 
privacy concerns. Although tremendous work has been made in establish- 
ing quantum cryptography for quantum communication, long-distance 
quantum communication remains a substantial barrier due to fiber atten- 
uation and operation mistakes. To achieve entirely secure quantum com- 
munications, numerous different forms of quantum encryption and other 
approaches, such as quantum key distribution, quantum secret sharing, 
quantum secure direct transmission, quantum teleportation, and quantum 
dense coding, may be necessary. Furthermore, further details are needed 
on the security of quantum secure direct communication, which allows 
secret messages to be transmitted directly over a quantum channel without 
the use of a private key. There have also been cases where some quantum 
processes that use quantum key distribution models to protect key security 
have been discussed. 


10.3.2. Blockchain 


In a 6G network, Blockchain technology offers a lot of potential applica- 
tions. Network decentralization, distributed ledger systems, and spectrum 
sharing are among examples. Network decentralization based on block- 
chain technology has the potential to improve network administration 
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and performance. The same can be said for the usage of blockchain in 
distributed ledger technologies, which would greatly improve authentica- 
tion security. In fact, blockchain has the potential to be one of the most 
disruptive Internet of Things technologies. Furthermore, by incorporating 
blockchain technology into a spectrum sharing scheme, issues such as low 
spectrum utilization and spectrum monopoly could be addressed while 
also ensuring spectrum use. 

Access control, authentication, and communication mechanisms all 
play a role in blockchain security and privacy. There is a blockchain radio 
access network design that can protect and manage network access and 
authentication among trustless network elements. There is also a new con- 
ceptual architecture for mobile service authorization based on blockchain 
technology. It was previously known that a means of utilizing the block- 
chain to improve the security of media access protocols and cognitive radio 
in order to acquire access to unlicensed spectrums existed. Furthermore, 
despite the fact that the 6G network’s decentralized architecture means that 
a hacker can only change records if more than 51% of the nodes are under 
his control (indicating that it is secure enough), there is no trusted third 
party responsible for secure data storage and management when security 
breaches occur. The hash capacity necessary to validate transactions in a 
blockchain-based network was discovered to have a negative impact on 
security. 


10.3.3 TeraHertz Technology 


Despite their widespread use in 5G networks, mm-wave bands are insuf- 
ficient in the 6G environment due to the necessity for high transmission 
rates. In any event, the Radio Frequency (RF) band is nearly full, and future 
technology cannot use it. Terahertz technology has accelerated as a result 
of these considerations. The 0.1-10 THz range, which has more spectrum 
resources than the mm-wave frequency, is used for terahertz communica- 
tion. It also makes use of both electromagnetic and light waves. There are 
several advantages to adopting the THz band. To begin with, THz commu- 
nication technology may be capable of supporting data rates of up to 100 
Gbps. Second, eavesdropping would be restricted due to the narrow beam 
and short pulse duration employed in THz communication, resulting in 
increased communications security. Third, THz waves have a very low 
attenuation through specific materials, which means they could be used in 
a wide range of applications. Furthermore, THz communication transmis- 
sion can be extremely directed, reducing intercell influence dramatically. 
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The energy consumption of THz communication has been identified 
as a significant issue. The size of 6G cells must be reduced from “small” 
to “tiny,” necessitating the development of more complicated hardware 
and designs. THz, like all other technology, has its own set of security and 
privacy concerns. The majority of these are concerned with authentica- 
tion and malicious conduct. Concepts such as the electromagnetic signa- 
ture of THz frequencies, for example, could be employed in physical layer 
authentication methods. Furthermore, while THz communication is usu- 
ally thought to make eavesdropping harder, a signal broadcast via narrow 
beams could still be intercepted by an eavesdropper. They do, however, talk 
on how to defend against such an eavesdropping attack. 


10.3.4 Visible Light Communication (VLC) 


The employment of visible light communication technology to address the 
growing need for wireless connectivity is a viable option. VLC has been 
investigated for several years and has already been applied in a variety 
of applications, including indoor positioning systems and the Vehicular 
Ad Hoc Network (VANET) network. VLC offers greater bandwidths and 
can resist electromagnetic interference better than RF, which has inter- 
ference and significant latency. The advancement of VLC technology has 
also been aided by the advent of solid-state lighting. Some researchers 
have sought to employ LEDs for high-speed data transfer since LEDs can 
switch between different light intensities at a very fast rate. LiFi is a VLC 
system that supports multiple access and has the potential to provide 
high-speed services to a large number of mobile users. However, some 
flaws in VLC technology are impeding its advancement. Because intense 
natural light will impact transmissions, the main application cases for 
VLC should be indoors. Malicious behaviors and communication mech- 
anisms are among the security and privacy concerns raised by VLC. It 
was discovered that an attacker must be in the line of sight of the victim 
in order to initiate an assault on a running VLC operation. Obviously, 
this would make it easier for the attacker to be detected. There are a 
number of approaches and protocols that can be used to secure security 
transmission data, including SeeVLC, a preceding solution for VLC links 
that improves the physical layer’s security, and others. Furthermore, it 
was discovered that eavesdroppers’ assistance can decrease the security 
of VLC technology. 
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Comparison of communication performance indicators between 5G and 6G. 


Communication 
performance 5G 6G 
indicators Definition requirements requirements 


Peak data rate Maximum data rate 20 Gbps >100 Gbps 
can reach per user/ 
device under ideal 
conditions 
User experience | Datarate achievable | 0.1-1 Gbps >1 Gbps 
data rate for users/mobile 
devices in target 
coverage of 
Communication | Time interval between | 1 ms 0.1 ms 
latency sending packets 
at source and 
receiving them at 
destination 
m2 


Area traffic Total traffic 10 Mbit/s/m? 1Gbit/s/m? 
capacity provisioned per 
geographic unit 


Connection Total number of 1/ 10-100/m? 
density connected and/or 
accessible devices 
out of units of area 
Mobility Maximum relative 500 km/h 1000 km/h 
speed between 
transmitter and 
receiver when 
certain QoS is met 
Reliability Probability that 0.99999 0.9999999 
will successfully 
transmit a fixed size 
packet within the 
specified maximum 
time 


Timing accuracy | Precision time Microsecond Nanosecond 
synchronization level level 
between devices 
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Comparison of 5G services and 6G services. 


Basic telecom | Basic telecom XR, holographic 
business services, VONR, telepresence, 
new voice, 5G multi-sensory 


Comma: messaging, etc. interconnection, etc. 


nication ; ‘ 
Data On-demand mobile | Higher-performance on 


connection} data connectivity demand mobile data 
connectivity 


Information UE positioning, some | Provide basic information 
information services natively, 
including wireless 
sensing, better 
network information 
distribution, and public 
sector information. 


Computing Support convergent 
computing services 
natively, such as 
processing speed, 
storage, AI, etc. 


10.4 6G Security Architectural Requirements 


Since 6G is intended to be a more open network than 5G, the boundaries 
between on-grid and off-grid become increasingly blurry. As a result, cur- 
rent network security measures such as IPsec and firewalls are not strong 
enough to protect a network from external intruders [3]. The 6G security 
architecture must support the essential zero trust (ZT) security concept 
in mobile networks to mitigate this issue. ZT is a security paradigm that 
prioritizes the protection of system resources. 

The following lines describe the security requirements that the 6G net- 
work security architecture must manage and handle. 

Virtualization Security Solution: Virtualization protection issues want 
the usage of a machine with a steady virtualization layer, which incorpo- 
rates a protection software/technology that identifies dangerous hidden 
software, along with rootkits. The hypervisor must be able to segregate 
storage and network services that use secure protocols such as TLS, SSH, 
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and VPN into entirely separate categories. Virtual Machine Introspection 
(VMI) is a hypervisor capability that examines and identifies security con- 
cerns for each virtual machine (VM) by analyzing vCPU register informa- 
tion, file I/O, and communication packets. 

Automated Management System: The most important thing to deal 
with open-source security issues is to control the vulnerabilities created 
by using, updating and removing open-source software. Therefore, rapid 
threat detection requires an automated management system capable of 
detecting vulnerabilities and applying updates. Another step is needed to 
ensure that patched software is installed quickly and securely using a secure 
OTA method. In addition, a security governance framework should be cre- 
ated to manage (1) long-term open-source vulnerabilities, (2) changes in 
developer opinion, and (3) deployment of solutions security. 

Data security using AI: To ensure that AI systems are protected against 
AML, they must be public about how they protect users and mobile com- 
munication systems. The first step of the process is to create AI models in a 
trusted system. In addition, a technique such as a digital signature should 
be used for authentication if the AI models running in the user equipment 
(UE), radio access network (RAN) and core have been updated or mod- 
ified maliciously by a hostile attack. When a system detects an unsafe AI 
pattern, it must perform self-healing or recovery operations. The system 
should also limit data collection to AI training to trusted network elements. 

Preserving Users’ Privacy: In order to maintain the confidentiality of the 
user’s personal information, the user’s personal information must be stored 
and used according to the protocols agreed between the service provider, 
the mobile network operator (MNO), subscriber and MNO. The 6G system 
protects personal information in a Trusted Execution Environment (TEE) 
and trusted software, and reduces or anonymizes the amount of data that 
is made public when it is used. Before the MNO publishes personal infor- 
mation, its authenticity and permission must be confirmed. When it comes 
to user information, another alternative is to use homomorphic encryption 
(HE) so that the data can be accessed in encrypted form. To protect user 
location and usage privacy, Al-based solutions, such as learning-based 
privacy-aware offloading systems, can be implemented. 

Post-Quantum Cryptography: Current asymmetric key encryption 
methods should be abandoned in 6G systems because quantum com- 
puters would make them insecure. Many scholars have focused on post- 
quantum cryptography (PQC) solutions suchas network-based cryptography, 
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cypher-based cryptography, multivariate polynomial cryptography, and 
hash-based signatures. From 2022 to 2024, the US National Institute of 
Standards and Technology (NIST) will select the best PQC algorithms as 
part of its PQC research. The key length currently under discussion for 
PQC should be several times the length of Rivest-Shamir-Adleman (RSA). 
PQC is expected to be computationally more expensive than the current 
RSA approach. Therefore, PQC must be integrated in accordance with the 
HW/SW service and performance requirements of the 6G network. 


6G Security Challenges 
This section discusses some of the challenges related to AI/ML in the 6G 
system. 


e ‘Trustworthiness: As AI manages cybersecurity, the stabil- 
ity of machine learning models and components becomes 
important. 

¢ Visibility: Real-time monitoring of security operations 
based on AI and machine learning to ensure controllability 
and reliability. 

¢ Ethical and legal aspects: Some customers or applications 
may be limited by Al-based optimization strategies. Who is 
responsible for the failure of AI-controlled security services? 
Are Al-powered security solutions uniform in protecting all 
users? Who is responsible for the failure of Al-controlled 
security services? [4] 

e Extensibility and viability: Secure data exchange is necessary 
to protect the privacy of federation learners. The required 
compute, network, and storage resources need to be scal- 
able, which is an AI/ML hurdle. 

¢ Controlled security tasks: Using AI/ML security solutions 
in combination with large-scale data operations can incur 
significant overhead [5]. 


In the process of learning and inference, the flexibility of the model 
must be safe and flexible. 
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Sense performance indicators for two system configurations. 


System System System 
parameters configuration 1 configuration 2 

Central 6 GHz 30 GHz 
aa 

| Bandwidth —_| 400 /400MHz 2 [2GHz sd 

Number of 256/8 dBi 512/8 dBi 
antenna 
elements/ 
element gain 

Transmitting 55 dBm 40 dBm 
power of BS 

Inter-site 500 m 200 m 
distance 


Target 120 km/h 120 km/h 
maximum 
velocity 
Coherent 5 ms 1 ms 
processing 
interval 
Sensing Distance 0.375 m 0.075 m 
performance resolution 
at cell edge Velocity 5 m/s 5 m/s 
resolution 
Angular 7.2°/7.2° 3.6°/7.2° 
resolution 
(azimuth/ 
zenith) 
Distance ~0.1m ~0.1m 
accuracy 
Velocity ~1 m/s ~7 m/s 
accuracy 


Angular ~2°/2° ~5°/10° 
accuracy 
(azimuth/ 
zenith) 
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Typical use cases and application scenarios for ISAC 


Category of 
ISAC use 
cases 


Coarse-grained 
sensing 


Fine-grained 
sensing 


10.5 


Use cases 


Monitoring the weather and 
the quality of the air 


Traffic flow detection, 
pedestrian volume 
statistics, and intrusion 
detection 


Localization, tracking, 
and measurement of 
the target object's range, 
speed, and angle 


Mapping the environment 


Face, motion, and position 
recognition 


Monitoring of vital 
indicators (heartbeat, 
respiration, etc.) 


Imaging, detecting 
materials, and analyzing 
composition 


Future Enhancements 


Application scenarios 


Agriculture, meteorology, 
and daily living services 


Smart transportation, 
security surveillance 


Radar application scenarios 


For car and UAV 
(Unmanned Aerial 
Vehicle) navigation, 
smart driving and city 


Interactive intelligence, 
gaming, and smart 
homes 


Medical and health care 


Industry, biomedicine, and 
security inspection 


1. Since DLT and 6G are expected to work together, vulner- 
abilities in blockchain and smart contracts can have an 
unintended impact on 6G networks. Of course, when imple- 
menting a DLT/blockchain solution on a 6G network, users 
should always adhere to the available procedures to miti- 
gate the above security threats. However, the implementa- 
tion of certain security techniques may be more important 
to the public blockchain than to the private blockchain. For 
example, smart contracts are adopted by every node in the 
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blockchain network, so debugging and modifying smart 
contracts can be a time-consuming task. Smart contracts are 
very important in DLT/blockchain systems to enable auto- 
mation, so it is important to make sure they are accurate. In 
addition, smart contracts need to be properly validated for 
correct functionality before deploying to hundreds of block- 
chain nodes. 

. Quantum computing is likely to become commercially 
available in the coming years and poses a significant threat 
to current cryptography. Quantum computing is currently 
intended for use in 6G communication networks to detect, 
mitigate, and prevent security gaps. Scientists have already 
begun exploring quantum-resistant technologies and 
cryptographic solutions to prepare for the threat posed by 
quantum computing in the future 6G era. Lattice-based, 
code-based, hash-based, and multivariate-based cryptogra- 
phy is a part of post-quantum cryptography primitives. The 
grid calculation problem works better with IoT devices in 
the current environment. The short key length makes it suit- 
able for 32-bit architectures. 

. Al and machine learning are expected to play a major role 
in 6G. AI and ML, on the other hand, make 6G intelligence 
network management systems vulnerable to AI/ML-related 
threats. There are various Al/ML methods to counter these 
dangers. To improve resilience, enemy training inserts mod- 
ified instances into the training data, similar to attacks. 
Another defense method is defensive distillation. This is the 
output of a previously trained network and is based on the 
idea of transferring knowledge from one neural network to 
another via soft labels that reflect the possibilities of different 
classes. These are used for training instead of hard labels that 
assign all data to a particular class. Both solutions have been 
successful against both evasive and enemy attacks. 

. Unauthorized receivers can intercept signals transmitted in 
line of sight (LOS) even with very narrow beams. As a result, 
THz communications can lead to attacks on data transmis- 
sion, eavesdropping, and access control. There is evidence 
that unauthorized users can intercept the signal by placing 
an object on the transmission path and scattering radiation 
in that direction. It has been proposed to characterize chan- 
nel backscatter and detect some, if not all, eavesdroppers. 
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6G efficiency indicators 


6G 
efficiency | Communication 
indicators indicators 


Spectral Definition: throughput 
efficiency provided per cell per 
frequency resource 
6G requirements: 2-3 
times higher than 5G 


Energy Definition: number 
efficiency of bits that can be 
transmitted per unit 
of energy, or the 
amount of energy 
required to transmit 
1 bit 

6G requirements: 

More than 100 times 
improvement in 
network energy 
efficiency compared 
to 5G. 

10 to 100 times 
improvement in 
terminal energy 
efficiency compared 
to 5G. 


Cost Definition: the number 
efficiency of bits that can be 
transmitted per unit 
cost, or the cost 
required to transmit 
1 bit. 
6G requirements: 

more than 100 
times improvement 
compared to 5G 


Sensing indicators 


The time and 
frequency 
resources 
required to 
complete one 
sensing task 


Energy required 
to complete one 
sensing task 


Cost to complete 
one sensing task 


Computing 
indicators 


Number of 
operations 
available 
per unit of 
energy 


Number of 
operations 
available 
per unit 
cost 
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10.6 Summary 


We are currently in the early stages of the 5G commercialization process, 
which is expected to usher in a significant revolution—or, at the very least, 
evolution—in the mobile wireless communications industry. The revolu- 
tionary component of 5G is how it “extends” previously limited use cases, 
such as mobile internet, to include ultra-reliable low-latency communi- 
cations and enormous machine-type connections. As a result, the mobile 
communications industry invented the phrase “5G triangle.” Although 5G 
has only recently been deployed in several countries and areas and is still 
in its early stages, researchers and developers are actively working on the 
creation of 6G, particularly in relation to new communication techniques 
and technologies. The sixth generation is an upcoming mobile system that 
would dramatically alter future mobile wireless networks. It is projected 
to be deployed in 2030. The “5G triangle” will grow into the “6G hexa- 
gon” when more dimensions are added to open new industrial use cases, 
according to this analysis. Furthermore, 6G is expected to deliver a high 
data rate and ultra-low latency of up to 10 Tbps and 10-100 s, respectively. 
It will also improve spectrum efficiency and connection density by a factor 
of 10-100 over 5G. Furthermore, 6G will enable intelligent IoE and the 
goal of everything being connected 100 percent intelligently, bringing new 
applications beyond IoE. The sixth generation ushers in a new era of seam- 
less machine-to-human interactions, object intelligence, and the merging 
of the virtual and physical realms. The core pillars of the 6G vision are 
ultra-high reliability, ultra-high flexibility, ultra-high privacy and security, 
and ubiquitous coverage. 
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Abstract 

The intelligent information gathering, analysis and monitoring process reduces 
the human efforts in almost every field of applications such as e-healthcare, indus- 
try manufacturing, e-voting and intelligent transportation systems, etc. However, 
the involvement of smart devices/systems for processing and taking intelligent 
decisions invites a number of severe security attacks in the network. The intruders 
may change the communication pattern and perform various changes inside the 
network for their own benefits. The aim of this paper is to propose an efficient and 
trust-based information forwarding mechanism for the IoT systems. The proposed 
approach provides the security while transferring, collecting, analyzing or taking 
accurate decisions by the smart devices in an appropriate manner. The proposed 
approach is validated against traditional mechanism over various security metrics. 
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11.1 Introduction 


In recent years, the demand for intelligent and smart-based systems has 
been increasing at a very high pace. Organizations are adopting every new 
or modern technology that is arriving in the market for improving their 
industrial growth [1, 2]. loT-based applications such as smart industrial 
manufacturing, smart home automation, intelligent transportation mecha- 
nism, and e-healthcare systems are a few ongoing research areas where IoT 
systems are replacing human interactions while taking shipping/manufac- 
turing decisions, gathering/analyzing patient records, or intelligent switch 
handling of home equipment [3, 4]. Though there are a number of advan- 
tages to adopting or using such automation systems, intelligent/smart sys- 
tems, they are still not being fully adopted by all the organizations in the 
market. The most severe and drastic issue is security, where organizations 
are afraid that an intruder or third party in the network will attack, steal or 
control their confidential information [5, 6]. 


11.1.1 Need of Security 


Whenever an intelligent automation system is adopted by any organiza- 
tion in the market, it’s the responsibility of each organization to ensure and 
provide 100% confidentiality and security to its client/user. Any breach 
in security may drastically affect the security of the system that may fur- 
ther lose the trust of its clients [7, 8]. Figure 11.1 presents the intelligent 
communication process by any organization for intelligent data forward- 
ing or collection mechanism by the system where a number of industrial 
IoT devices are taking independent decisions and accurate actions upon 
requirement by its environment. 

Various security schemes have been proposed by several scientists or 
authors; however, trust-based methods are the most general and effective 
security scheme or approach to provide an efficient communication pro- 
cess in the network [9, 10]. The communication process done by highly 
trusted nodes may further improve the system performance and its over- 
all throughput that may directly increase/improve the growth of any 
organization. 


11.1.2 Role of Trust-Based Mechanism in IoT Systems 


The IoT systems play a significant role in ensuring security against various 
malwares, and intruders while distributing the information among various 
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Figure 11.1 A hybrid IloT architecture [9]. 


devices. After discussing the importance of smart devices and their need 
of security while communicating among each other, in this subsection we 
will discuss various security techniques using trust-based and blockchain 
mechanisms in the network. A number of trust-based systems have been 
proposed by various scientists and authors such as direct trust, indirect 
trust, hybrid trust, etc. [11]. In addition, the blockchain mechanism is used 
to ensure transparency while providing security during sharing of infor- 
mation among smart devices. 

The architecture of trust-based and blockchain network is presented in 
Figure 11.2 that represents the trust computation of IoT devices associ- 
ated with each device. The trust of each device is computed using various 
techniques such as direct, indirect and hybrid mechanisms. The trust of 
each device is stored in a database along with blockchain mechanism. In 
addition, the blockchain mechanism is integrated with trust-based devices 
to ensure transparency among devices [12, 13]. The role of blockchain and 
trust-based intelligent devices is used in various applications such as indus- 
try 4.0, smart homes, smart city, intelligent transportation systems, etc. 
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The text below explains industry 4.0 along with blockchain and trust-based 
systems. 

Figure 11.3 presents the role of trust-based and blockchain mechanisms 
in smart devices. In this we have taken the example of industrial Internet of 
Things where a number of communications and distributions are accom- 
plished through intelligent devices. Here the manufacturing, recording 
of products, manufacturing, autonomous robots, system integration and 
product distribution is done through intelligent devices. In addition, the 
security of intelligent devices is ensured using trust-based mechanism 
where the trust of each device is computed before distributing the infor- 
mation. Further, the blockchain mechanism is integrated with trust-based 
mechanism in order to ensure transparency among the devices [14, 15]. 


11.1.3. Contribution 


The aim of this paper is to propose an efficient and secure trust-based 
mechanism that forwards the ongoing data/information to its neighboring 
nodes or in the networking environment after computing their trust val- 
ues. A trust value of each node will be computing based upon their behav- 
ior that will further decide the acceptance and blockage of communication 
process in the network [16]. In addition, the blockchain mechanism is 
associated with trust-based systems in order to further ensure the trans- 
parency in the network. The proposed work is validated and implemented 
against various security metrics against the traditional approach used by 
other scientists such as accuracy, data alteration rate, blockchain network 
of legitimate devices. All the parameters are discussed with graphs and 
proper explanation while sharing the information among devices. 

The remaining organization of the paper is as follows. Section 11.2 
describes the literature survey and related work proposed by various 
authors/academicians to ensure a secure communication process. Section 
11.3 presents the in-depth detail of proposed trust-based information for- 
warding communication mechanism. The implementation and validation 
are discussed in section 11.4. Finally, section 11.5 concludes the paper 
along with some future directions. 


11.2 Related Works 


This section discusses the various security methods such as authentication, 
key management, encryption, blockchain based and trust-based schemes 
proposed by various researchers. Table 11.1 depicts the comparison of 


244 WIRELESS COMMUNICATION FOR CYBERSECURITY 


Table 11.1 Literature survey. 
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Table 11.1 Literature survey. (Continued) 


Measuring parameter 


Needed to 


Rathee Secure Vaccine | The authors proposed 


et al. [22] Distribution 


Lashmi and | Decision- 
making 
approach 


Lee et al. Trusted 
[24] framework 
based on 
ontology 


an ANN scheme 
to provide viable 
and blockchain 
mechanism for 
distributing the 
vaccine in the 
network. 


The authors projected 


home-based 
mechanism by 
focusing on the 
face recognition 
and anomaly 
detection schemes 
to ensure a secure 
communication in 
the network. 


The proposed 


mechanism is 
analyzed by 
determining the 
highest degree 

of trust based on 
trusted ontology and 
estimated the degree 
of trust for each 
communicating 
device. 


analyze the 
computation 
delay of trust. 


Need to monitor 
continuously 
for ensuring 
a secure 
communication 
in the network. 


Authors have not 
considered 
the real time 
scenario while 
communicating 
among each 
other. 


already proposed schemes/methods along with their measuring environ- 
ments and limitations in the various applications. 
Sindhuja [17] proposed an efficient and secure routing mechanism by 


selecting the cluster heads for congestion-free IoT systems. The author used 
sub-strategy and load adjusting and proficiency approach while ensuring 
security in the network. Abed [18] discussed the wireless network, IoT and 
blockchain importance while ensuring an efficient and effective communi- 
cation in the network. The author highlighted the importance of the web 
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of things along with IoT devices and blockchain technology together in the 
network. The author pointed out some of the open issues for the research 
community for ensuring a better information transmission process in the 
environment. Rathee et al. [19] proposed a trusted and blockchain-based 
cyber securing mechanism against HoT in the network. The authors have a 
trusted evaluation mechanism for ensuring a secure and transparent com- 
munication process upon integration of blockchain technology in the net- 
work. They verified the proposed approach over various security measures. 

Rajavel et al. [20] proposed a trust-aware pricing scheme for Vehicular 
IoT in the network. The authors proposed a MobiTrust mechanism for 
identifying various types of evidences in the network. Further, they pro- 
posed a trust-based resource and pricing allocation mechanism with the 
aim of maximizing the profits in the network. The proposed mechanism is 
simulated over various benchmarks. 

Lashmi and Pillai [23] projected a decision mechanism approach for 
detecting the security in which intruders may compromise a number of 
devices in the network. The authors projected home-based mechanism by 
focusing on the face recognition and anomaly detection schemes to ensure 
a secure communication in the network. They further validated the pro- 
posed mechanism by determining various illegal activities of the intruders 
by regularly monitoring and capturing the oT devices’ communication. 

Lee et al. [24] projected a trusted framework based on ontology of the 
individuals according to their perspectives and purposes. The proposed 
mechanism is analyzed by determining the highest degree of trust based 
on trusted ontology and estimated the degree of trust for each communi- 
cating device. 

A number of secure communication and information transmission 
schemes/methods have been proposed by various scientists/researchers; 
however, the trust-based security where computation and communication 
determine the legitimacy of each communicating device is still at its early 
stage. Further, the proposed schemes are not fully adopted for IoT devices 
according to today’s era. 

Though various security schemes have been proposed by several authors, 
the proposed mechanisms have their own limitations and drawbacks that 
lead to a research question on the security of loT-based applications. This 
paper proposes a trust-based information forwarding scheme for IoT- 
based systems along with their validation and implementation steps in the 
subsequent sections. In addition, the proposed scheme is validated and 
verified against a number of security metrics against existing and tradi- 
tional mechanisms. 
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11.3 Estimated Trusted Model 


Trust is considered one of the significant performance metrics in order to 
ensure or analyze the legitimacy of each communicating device in the net- 
work. A number of trust-based schemes have been proposed and used by 
various researchers and scientists for computing the trust of each device, 
such as direct trust, indirect trust, hybrid trust, estimated trust, etc. Each 
trust-based approach has its own significance while choosing the commu- 
nication pattern and transmitting of information scenario. Among them 
in this paper, we have used estimated trust value where the communicated 
devices trust depends on forwarding information to their neighboring 
devices. The estimated trust value analyses the legitimacy by depending 
upon three different factors such as total trust, processed trust and esti- 
mated trust. The neighboring devices will estimate a trust value depend- 
ing upon their communication behavior or transmission of information 
among each other [25]. 

The designing of a trusted information mechanism is crucial for any 
application having IoT devices in the network. The system architecture of 
a proposed model is illustrated in Figure 11.1, having several types of data 
forwarding IoT-devices/nodes in the network. The trust computation of 
IoT device mainly consists of three parts as presented in eq. (11.1), where 
T 1 Tepresents the total computed trust and T,, T., and T, denotes the 
trust computation for estimated, processed and transmitting information 
respectively. 


Thora = L. + ee (11.1) 


total 


In addition, T| can be defined as the trust computation for neighboring 
device, T,,,, is the amplifying trusted value estimated by the nodes i, and d 
is defined as the transmission distance. 


T, =nx T, 


11:2 
T, =n x (T, + Typ) —— 


Therefore, the overall trust computation is further defined as: 


Tig = eM + a) + Z (11.3) 
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Further, in order to accurately recover the original information by the 
neighboring nodes, the gathered trust can be further defined as the basis 
vector over sparse basis and basis vector as: 


Ae] Ow, = by x N“ (11.4) 


Where, X= x,, A ek t= (1,2, ....N) anda = 5 Op xccie Ol 


11.4 Blockchain Network 


Recently, a number of wireless communication devices are present in the 
network for the ease of the users. In various applications such as healthcare 
systems, smart cities, smart homes, industrial internet of things, a large 
number of intelligent devices are used to determine the communication 
in the network. The smart devices further need proper privacy and secu- 
rity while maintaining the communication and transmission of informa- 
tion. Till now, a number of security protocols and algorithms have been 
proposed by various researchers/scientists; however, a number of cyber 
issues and security issues are still present in the network. Blockchain 
technology has been considered as one of the recent paradigms in today’s 
era for ensuring a secure and transparent communication in the network. 
Only a few organizations have adopted this technology and it is still in its 
early stage. Further, in order to ensure security and transparency in the 
network, the blockchain network is maintained. The blockchain consists 
of all those nodes that are trusted and being surveilled by the network 
for ensuring transparency during communication. The integration of 
trust mechanism along with blockchain network of trusted nodes can 
be discussed in Algorithm 1. The presented algorithm 1 illustrates the 
trusted computation and blockchain mechanism for ensuring a secure 
communication in the network. The estimated trust approach is used 
to analyze the legitimacy of each communicating device. The estimated 
value of each device ensures security and permits further transmission 
of information in the network. In addition, the blockchain mechanism 
is used to maintain the continuous surveillance in the network for better 
protection of the devices. 
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Figure 11.4 Flowchart of proposed model. 
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Algorithm 1 
Begin 
Step 1: Input: Number of IoT devices 
Output: Device is ideal or fraud. 
Given: Trust-based computation and a blockchain network. 
Step 2: a) Establish the networking environment 
For all nodes I, = {I=1, 2,....N} 
For p=1 to N then 
Compute Trust of each device using esti- 
mated trust model 
If (Device is ideal) then 
Permit further communication 
Else 
Block/deny further communication 
End if 
b) Maintain a Blockchain network of highly trusted devices. 
Step 3: Each ideal device is surveillance using blockchain 
End For 
End For 


The flowchart of the same is presented in Figure 11.4. 

The aim of this paper is to propose a secure and efficient trust-based 
communication process among IoT devices in the network. In order to 
conduct further surveillance, a blockchain network is maintained for keep- 
ing the record of highly trusted nodes that are providing the services in the 
network. 


11.5 Performance Analysis 


The simulation of the proposed framework is validated against Rajavel 
et al. (also known as baseline approach) where the authors proposed a 
cryptographic and key management scheme for ensuring secure com- 
munication among smart devices in the network. The authors proposed a 
trust-aware pricing scheme for Vehicular IoT in the network. The authors 
also proposed a MobiTrust mechanism for identifying various types of evi- 
dences in the network. Further, they proposed a trust-based resource and 
pricing allocation mechanism with the aim of maximizing the profits in the 
network. The proposed mechanism is simulated over various benchmarks. 

The performance analysis of the proposed mechanism is validated 
against Wu et al. over various security metrics such as accuracy, and data 
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alteration results. The number of devices is considered as 50 at present for 
analyzing the behavior of each device during transmission of informa- 
tion in the network. The proposed and existing mechanism is analyzed 
over a small area of the network to check the accuracy and competence 
of trust computation along with its complexity in the network. The data 
alteration and accuracy metrics can be easily evaluated over a small part of 
the network for determining the behavior of each communicating device. 
The proposed mechanism can be further analyzed over a large area of the 
network, whereupon increasing the size of the network, the devices may 
attain sustainability and understand the dynamic pattern of information 
transmission in the network. The performance measurement while ana- 
lyzing the various metrics may further affect the nature of communicating 
nodes in the network. 


11.5.1 Dataset Description and Simulation Settings 


The proposed framework is tested over a synthesized dataset having a 
number of legitimate devices upon establishment of network. The devices 
are further altered intentionally for the purpose of showing the outper- 
formance and accuracy of the proposed framework over existing scheme. 
The simulation is done over MATLAB having 700 x 700 area size with 50 
number of nodes. The nodes are altered at the rate 10% upon increasing 
the network size from 5 to 50 count. Each node can start or transmit the 
communication process and information in the network at any time. 


11.5.2 Comparison Methods and Evaluation Metrics 


The proposed mechanism is compared against the following schemes: 


Accuracy: The proposed and existing approaches are measured against 
accuracy metrics that determine the number of times both approaches are 
successfully able to detect or identify the number of altered or legitimate 
number of nodes in the network. 
Data alteration rate: This metric is used to determine the alteration rate 
that can be successfully done by any intruder while transmitting or for- 
warding the messages in the network. 
Blockchain network: It is used to measure the block size and time required 
to validate a particular node during communication or before adding in to 
the blockchain network. 

The above-said evaluation metrics are analyzed against various secu- 
rity metrics over the existing mechanism and the reason of the proposed 
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approach outperformance with proper graphs appears in the results dis- 
cussion section below. 


11.6 Results Discussion 


This section describes the results and discusses the proposed mechanism by 
explaining various graphs against accuracy, data alteration rate, blockchain 
of legitimate devices. Figure 11.5 depicts the accuracy graph of proposed 
and existing approaches while increasing the number of altered nodes in 
a network size of 5 to 50 devices. The proposed mechanism is successfully 
able to outperform the existing mechanism because of its trust computa- 
tion. The computed trust value ensures the legitimate nodes involvement 
while performing the communication or forwarding the information to its 
neighboring nodes. 

The computed trust value does not include overhead of any external 
memory and delay while ensuring the security in the network. Figure 11.6 
presents the data alteration rate that represents the number of nodes that 
can be successfully altered by the intruders upon establishment of the net- 
work. The proposed approach outperforms in this case as the number of 
nodes whose trust values is higher can take part in the communication 
process while the nodes having lower trust value could never be a part 
of communication process. The data rate alteration because of computed 
trust rates further improves the data alteration rate in comparison to the 
existing approach. 

Finally, Figure 11.7 represents the blockchain of legitimate devices in 
the network. In order to surveille the entire network and provide secu- 
rity without increasing the overhead or delay may further include the 
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Figure 11.7 Blockchain of legitimate devices. 


blockchain process. The legitimate nodes blockchain network further 
ensures transparency and provides higher-level security during the com- 
munication process in the network. However, in the existing mechanism, it 
generates lots of communication and computational overheads along with 
increased delays in the network. 


11.7 Empirical Analysis 


Further, the empirical analysis may also determine the overall performance 
of the system by identification of their computation overhead, delays and 
security concerns while analyzing the dynamic patterns of communicating 
devices in the network. The proposed and existing mechanisms are again 
analyzed using an empirical analysis where a number of metrics are con- 
sidered for further analysis of both approaches. The number of empirical 
factors is considered as below: 
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Computation overhead: the computation overhead where each device 
needs to do some of the computation for identifying the legitimacy of each 
device. The computations can be done at the end of communicating device 
itself or devices may communicate among each other for further analysis 
of the network. 

Response delay: in cases where devices are analyzed by communicating 
among each other, the delay may also be high in that case. The communi- 
cation among devices while analyzing or identifying the legitimacy of each 
communicating device may result in large response delays in the network 
that may further invite a number of security threats in the network. 
Security concerns: the dynamic communication in the network may invite 
a number of networking and communicating security hacks in the net- 
work. Security can be further compromised where intruders may easily 
understand the patterns of communicating devices and mimic exactly 
legitimate device in the network. 

This empirical study is further analyzed over proposed and existing 
scenarios where the proposed mechanism outperforms existing methods 
because of less overhead trusted algorithms. The estimated trust degree 
computation may easily recognize the legitimacy of the device without 
having huge computations or calculations in the network. In addition, 
the response delay and other security hacks can be easily traced using 
blockchain-based technology. Each and every communicating device is 
placed on blockchain for their continuous surveillance and tracing of each 
device in the network. 

The time complexity of the proposed phenomenon is big O because of 
sensor nodes usage and their battery life while transmitting the informa- 
tion in the network. Each and every device is communicating among each 
other while transmitting the information in the network. Since the pro- 
posed phenomenon opted a forwarding mechanism where the trust value 
of each device is decided based estimated trust scheme where the neigh- 
boring devices trust values are involved to finalize the legitimacy of any 
current device. Each device took some amount of time to analyze the final 
decision of any communicating device to allow or block that from the net- 
work. In a network of size ‘N’ the trust value of ‘N’ different devices will be 
computed after communicating among each other. Therefore, the overall 
complexity of the algorithm (as mentioned above in Algorithm 1) is Big 
(O) notation or simply defined as the time taken by each device for execut- 
ing the trust that is involved in transmitting of information in the network. 
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11.8 Conclusion 


This paper has proposed a secure and trusted communication procedure 
for forwarding the information among nodes in the network using IoT 
devices. The proposed systems compute the trust of each communicat- 
ing node according to their processed, estimated and transmitted ratio 
that can be further decided for allowing or blocking the remaining com- 
munication in the network. The proposed mechanism is validated and 
implemented against various security protocols in the network against a 
traditional approach. The depicted graphs and results outperform the pro- 
posed approach because of continuous surveillance and trust value com- 
putation before permitting them for communication in the network. The 
dynamic pattern recognition along with their security threats can be fur- 
ther discussed as the future direction of this paper. 
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